date:20110718

Re: [HACKERS] Reduced power consumption in WAL Writer process

2011-07-18 Thread Simon Riggs

On Sun, Jul 17, 2011 at 10:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On Fri, Jul 15, 2011 at 6:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'd say send the signal when wal buffers are more than X% full (maybe
 half).  The suggestion to send it when wrapping around at the end of the
 array is not quite right, because that's an arbitrary condition that's
 not related to how much work there is for the walwriter to do.  It
 should be cheap to check for this while advancing to a new wal buffer.

 I think we need to put the calculation and SetLatch() after we release
 WALInsertLock, so as to avoid adding contention.

 Yeah, I agree with putting the actual SetLatch call after we release the
 lock ... but doesn't the calculation need to be done while we're still
 holding it?  Otherwise it'd be using potentially-inconsistent values.

The calculation is just a heurustic, so doesn't need to be exactly
accurate. We need the latest values derived while holding the lock,
but we don't need to ensure they change while we make the calc.

The calc needs to say if we are the ones who make the array more than
half full, then wake up the WALWriter. It might already be awake, it
might be already be more than half full or it might even be less than
half full now - none of those cases are very important.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Single pass vacuum - take 1

2011-07-18 Thread Simon Riggs

On Mon, Jul 18, 2011 at 2:20 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jul 15, 2011 at 7:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
 My additional requests would be that we can easily tell which blocks
 have been modified like this, that we have a way to turn this off if
 we get bugs for next few releases, that we check it all works with Hot
 Standby just fine and that all the block inspection tools support it.

 To me, this seems like a bit too much of an internal change to justify
 adding a GUC.

I will be happy to remove it again when we have shown there are no
bugs getting this wrong is a data loss issue.

 But it probably is a good idea to ping the authors of
 the various block inspection tools -- does contrib/pageinspect care
 about this sort of thing, or just the out-of-core stuff?

I'm the author of the main sections of the pageinspect module, and
yes, I care. :-)

I want to be able to say easily and quickly ahhh, that page was
touched by the new code. As mentioned above, this code has potential
for causing data loss and it is important to look at ways of
diagnosing bugs. Pageinspect was written to allow us to confirm at a
low level that HOT was working.

 I'd also think that it would be a good idea to consider (most likely
 as a separate patch) the idea you suggested upthread: skip phase 2 if
 the number of dead tuples is insufficient to justify the cost of
 scanning the indexes.  I've been wondering if it might make sense to
 couple that change with a decrease in vacuum_scale_factor - in effect,
 vacuum more frequently, but don't scan the indexes every time.

 One possibly useful metric for benchmarking these sorts of changes is
 to run a write workload for a while until the size of the tables
 stabilize and then measure how big they are.  If a given change causes
 the table size to stabilize at a lower value, that suggests that the
 change makes vacuum more effective at controlling bloat, which is
 good.  Conversely, if the change causes the table size to stabilize at
 a higher value, that suggests that we've made vacuum less effective at
 controlling bloat, which is bad.  In fact, I'd venture to say that
 anything that falls into the latter category is dead on arrival...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduced power consumption in WAL Writer process

2011-07-18 Thread Simon Riggs

On Mon, Jul 18, 2011 at 1:48 AM, Robert Haas robertmh...@gmail.com wrote:

 Might be good to start a new thread for each auxilliary process, or we
 may get confused.

+1

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error in PQsetvalue

2011-07-18 Thread Pavel Golub

Hello, Andrew.

I hope you don't mind I've added this patch to CommitFest:
https://commitfest.postgresql.org/action/patch_view?id=606

You wrote:

AC On 6/3/2011 10:26 PM, Andrew Chernow wrote:

 I disagree -- I think the fix is a one-liner. line 446:
 if (tup_num == res-ntups !res-tuples[tup_num])

 should just become
 if (tup_num == res-ntups)

 also the memset of the tuple slots when the slot array is expanded can
 be removed. (in addition, the array tuple array expansion should
 really be abstracted, but that isn't strictly necessary here).


 All true. This is a cleaner fix to something that was in fact broken ;) You 
 want

 Attached a patch that fixes the OP's issue. PQsetvalue now uses pqAddTuple to
 grow the tuple table and has removed the remnants of an older idea that 
 caused
 the bug.


AC Sorry, I attached the wrong patch.  Here is the correct one.




-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fixing PQsetvalue()

2011-07-18 Thread Pavel Golub

Hello, Merlin.

I hope it's OK that I've added Andrew's patch to CommitFest:
https://commitfest.postgresql.org/action/patch_view?id=606

I did this becuase beta3 already released, but nut nothig is done on
this bug.

You wrote:

MM On Thu, Jun 23, 2011 at 7:54 AM, Andrew Chernow a...@esilo.com wrote:
    you are creating as you iterate through.  This behavior was
    unnecessary in terms of what libpqtypes and friends needed and may (as
    Tom suggested) come back to bite us at some point. As it turns out,
    PQsetvalue's operation on results that weren't created via
    PQmakeEmptyResult was totally busted because of the bug, so we have a
    unique opportunity to tinker with libpq here: you could argue that you

 +1

 Exactly at this moment I am thinking about using modifiable
 (via PQsetvalue) PGresult instead of std::map in my C++ library
 for store parameters for binding to executing command.
 I am already designed how to implement it, and I supposed that
 PQsetvalue is intended to work with any PGresult and not only
 with those which has been created via PQmakeEmptyResult...
 So, I am absolutely sure, that PQsetvalue should works with
 any PGresult.

 All PGresults are created via PQmakeEmptyPGresult, including libpqtypes.
  Actually, libpqtypes calls PQcopyResult which calls PQmakeEmptyPGresult.

 It might be better to say a server result vs. a client result.
 Currently, PQsetvalue is broken when provided a server generated result.

MM er, right-- the divergence was in how the tuples were getting added --
MM thanks for the clarification.  essentially your attached patch fixes
MM that divergence.

MM merlin



-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add temp_file_limit GUC parameter to constrain temporary file sp

2011-07-18 Thread Peter Eisentraut

On sön, 2011-07-17 at 18:39 -0400, Bruce Momjian wrote:
 Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   Tom Lane wrote:
   Add temp_file_limit GUC parameter to constrain temporary file space 
   usage.
   
   The limit is enforced against the total amount of temp file space used by
   each session.
   
   Mark Kirkwood, reviewed by C?dric Villemain and Tatsuo Ishii
  
   Should we document that sessions that exceed this limit generate an
   error?  I don't see any mention of this in the docs.
  
  Huh?  Seems like a waste of text to me.  What else would happen?
 
 Well, if we exceed work_mem, we spill to temp disk.  If we exceed temp
 disk, we error out.  Those different behaviors don't seem obvious to me.
 The work_mem docs do mention is spills to disk though, so maybe it is
 OK.

Sounds like it would be good to document the behavior if the limit is
exceeded in each case.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Reduced power consumption in autovacuum launcher process

2011-07-18 Thread Peter Geoghegan

On 18 July 2011 01:48, Robert Haas robertmh...@gmail.com wrote:
 Might be good to start a new thread for each auxilliary process, or we
 may get confused.

Right - I've done so in this reply. There isn't a problem as such with
the AV Launcher patch - there's a problem with most or all remaining
auxiliary processes, that needs to be worked out once and for all. I
refer to the generic signal handler/timeout invalidation issues
already described.

 ISTM that these generic handlers ought to be handling this
 generically, and that there should be a Latch for just this purpose
 for each process within PGPROC. We already have this Latch in PGPROC:

 Latch           waitLatch;              /* allow us to wait for sync rep */

 Maybe its purpose should be expanded to current process Latch?

 I think that could be a really good idea, though I haven't looked at
 it closely enough to be sure.

 Another concern is, what happens when we receive a signal, generically
 handled or otherwise, and have to SetLatch() to avoid time-out
 invalidation? Should we just live with a spurious
 AutoVacLauncherMain() iteration, or should we do something like check
 if the return value of WaitLatch indicates that we woke up due to a
 SetLatch() call, which must have been within a singal handler, and
 that we should therefore goto just before WaitLatch() and elide the
 spurious iteration? Given that we can expect some signals to occur
 relatively frequently, spurious iterations could be a real concern.

 Really?  I suspect that it doesn't much matter exactly how many
 machine language instructions we execute on each wake-up, within
 reasonable bounds, of course.  Maybe some testing is in order?

There's only one way to get around the time-out invalidation problem
that I'm aware of - call SetLatch() in the handler. I'd be happy to
hear alternatives, but until we have an alternative, we're stuck
managing this in each and every signal handler.

Once we've had the latch set to handle this, and control returns to
the auxiliary process loop, we now have to decide from within the
auxiliary if we can figure out that all that happened was a required
wake-up, and thus we shouldn't really go through with another
iteration. That, or we can simply do the iteration.

I have my doubts that it is acceptable to wake-up spuriously in
response to routine events that there are generic handlers for. Maybe
this needs to be decided on a case-by-case basis.

 On another note, I might be inclined to write something like:

 if ((return_value_of_waitlatch  WL_POSTMASTER_DEATH)  !PostmasterIsAlive())
   proc_exit(1);

 ...so as to avoid calling that function unnecessarily on every iteration.

Hmm. I'm not so sure. We're now relying on the return value of
WaitLatch(), which isn't guaranteed to report all wake-up events
(although I don't believe it would be a problem in this exact case).
Previously, we called PostmasterIsAlive() once a second anyway, and
that wasn't much of a problem.

 Incidentally, should I worry about the timeout long for WaitLatch()
 overflowing?

 How would that happen?

struct timeval is comprised of two longs - one representing seconds,
and the other represented the balance of microseconds. Previously, we
didn't combine them into a single microsecond representation. Now, we
do.

There could perhaps be a very large nap, as determined by
launcher_determine_sleep(), so that the total number of microseconds
passed to WaitLatch() would exceed the maximum long size that can be
safely represented on some or all platforms. On most 32-bit machines,
sizeof(long) == sizeof(int), which is just 4 bytes. (2^31) - 1 =
2,147,483,647 microseconds = only about 35 minutes. There are corner
cases, such as if someone were to set autovacuum_naptime to something
silly.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduced power consumption in autovacuum launcher process

2011-07-18 Thread Florian Pflug

On Jul18, 2011, at 15:12 , Peter Geoghegan wrote:
 struct timeval is comprised of two longs - one representing seconds,
 and the other represented the balance of microseconds. Previously, we
 didn't combine them into a single microsecond representation. Now, we
 do.

I haven't actually looked at the code, so maybe I'm missing something
obvious - but why don't we use a double precision float (double) instead
of a long for the combined representation?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduced power consumption in autovacuum launcher process

2011-07-18 Thread Robert Haas

On Mon, Jul 18, 2011 at 9:12 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Another concern is, what happens when we receive a signal, generically
 handled or otherwise, and have to SetLatch() to avoid time-out
 invalidation? Should we just live with a spurious
 AutoVacLauncherMain() iteration, or should we do something like check
 if the return value of WaitLatch indicates that we woke up due to a
 SetLatch() call, which must have been within a singal handler, and
 that we should therefore goto just before WaitLatch() and elide the
 spurious iteration? Given that we can expect some signals to occur
 relatively frequently, spurious iterations could be a real concern.

 Really?  I suspect that it doesn't much matter exactly how many
 machine language instructions we execute on each wake-up, within
 reasonable bounds, of course.  Maybe some testing is in order?

 There's only one way to get around the time-out invalidation problem
 that I'm aware of - call SetLatch() in the handler. I'd be happy to
 hear alternatives, but until we have an alternative, we're stuck
 managing this in each and every signal handler.

 Once we've had the latch set to handle this, and control returns to
 the auxiliary process loop, we now have to decide from within the
 auxiliary if we can figure out that all that happened was a required
 wake-up, and thus we shouldn't really go through with another
 iteration. That, or we can simply do the iteration.

 I have my doubts that it is acceptable to wake-up spuriously in
 response to routine events that there are generic handlers for. Maybe
 this needs to be decided on a case-by-case basis.

I'm confused.  If the process gets hit with a signal, it's already
woken up, isn't it?  Whatever system call it was blocked on may or may
not get restarted depending on the platform and what the signal
handler does, but from an OS perspective, the process has already been
allocated a time slice and will run until either the time slice is
exhausted or it again blocks.

 On another note, I might be inclined to write something like:

 if ((return_value_of_waitlatch  WL_POSTMASTER_DEATH)  
 !PostmasterIsAlive())
   proc_exit(1);

 ...so as to avoid calling that function unnecessarily on every iteration.

 Hmm. I'm not so sure. We're now relying on the return value of
 WaitLatch(), which isn't guaranteed to report all wake-up events
 (although I don't believe it would be a problem in this exact case).
 Previously, we called PostmasterIsAlive() once a second anyway, and
 that wasn't much of a problem.

Ah.  OK.

 Incidentally, should I worry about the timeout long for WaitLatch()
 overflowing?

 How would that happen?

 struct timeval is comprised of two longs - one representing seconds,
 and the other represented the balance of microseconds. Previously, we
 didn't combine them into a single microsecond representation. Now, we
 do.

 There could perhaps be a very large nap, as determined by
 launcher_determine_sleep(), so that the total number of microseconds
 passed to WaitLatch() would exceed the maximum long size that can be
 safely represented on some or all platforms. On most 32-bit machines,
 sizeof(long) == sizeof(int), which is just 4 bytes. (2^31) - 1 =
 2,147,483,647 microseconds = only about 35 minutes. There are corner
 cases, such as if someone were to set autovacuum_naptime to something
 silly.

OK.  In that case, my feeling is yes, you need to worry about that.
I'm not sure exactly what the best solution is: we could either
twiddle the WaitLatch interface some more, or restrict
autovacuum_naptime to at most 30 minutes, or maybe there's some other
option.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduced power consumption in autovacuum launcher process

2011-07-18 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Jul 18, 2011 at 9:12 AM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 There could perhaps be a very large nap, as determined by
 launcher_determine_sleep(), so that the total number of microseconds
 passed to WaitLatch() would exceed the maximum long size that can be
 safely represented on some or all platforms. On most 32-bit machines,
 sizeof(long) == sizeof(int), which is just 4 bytes. (2^31) - 1 =
 2,147,483,647 microseconds = only about 35 minutes. There are corner
 cases, such as if someone were to set autovacuum_naptime to something
 silly.

 OK.  In that case, my feeling is yes, you need to worry about that.
 I'm not sure exactly what the best solution is: we could either
 twiddle the WaitLatch interface some more, or restrict
 autovacuum_naptime to at most 30 minutes, or maybe there's some other
 option.

A wakeup once every half hour would surely not be an issue from a power
consumption standpoint.  However, I'm not sure I understand here: aren't
we trying to remove the timeouts completely?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add temp_file_limit GUC parameter to constrain temporary file sp

2011-07-18 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 Tom Lane wrote:
 Huh?  Seems like a waste of text to me.  What else would happen?

 Well, if we exceed work_mem, we spill to temp disk.  If we exceed temp
 disk, we error out.  Those different behaviors don't seem obvious to me.

[ shrug... ]  Feel free to change it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: a validator for configuration files

2011-07-18 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Sun, Jul 17, 2011 at 12:59 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I agree custom_variable_classes is conceptually messy, but it's a
 reasonably lightweight compromise that gives some error checking without
 requiring a lot of possibly-irrelevant extensions to be loaded into
 every postgres process.

 Hmm.  Maybe what we need is a mechanism that allows the configuration
 to be associated a loadable module, and whenever that module is
 loaded, we also load the associated configuration settings.  This is
 probably terribly syntax, but something like:

 ALTER LOAD 'plpgsql' SET plpgsql.variable_conflict = 'whatever';

 AFAICS, that would remove the need to set variables in postgresql.conf
 that can't be validated, and so we could just disallow it.

No, that only fixes things for the case of setting a variable in the
control file.  It isn't useful for ALTER ROLE/DATABASE SET.  And it
still has the problem of forcing every process to load every extension,
and as written it would also require the postmaster to read catalogs.

 OTOH, maybe that's more trouble than can be justified by the size of
 the problem.

Yeah.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduced power consumption in WAL Writer process

2011-07-18 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 On Sun, Jul 17, 2011 at 10:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah, I agree with putting the actual SetLatch call after we release the
 lock ... but doesn't the calculation need to be done while we're still
 holding it?  Otherwise it'd be using potentially-inconsistent values.

 The calculation is just a heurustic, so doesn't need to be exactly
 accurate. We need the latest values derived while holding the lock,
 but we don't need to ensure they change while we make the calc.

 The calc needs to say if we are the ones who make the array more than
 half full, then wake up the WALWriter. It might already be awake, it
 might be already be more than half full or it might even be less than
 half full now - none of those cases are very important.

Nonetheless, I think it'll be better to make the calculation and set a
local bool variable (to tell whether to do SetLatch later) while we're
in the midst of the advance-to-next-WAL-buffer code.  Even if it's true
that we would generally get an OK answer by reading the variables again
after releasing the lock, that's highly unlikely to be a performance
win, because of cache line contention effects (ie, having to wrest the
line back from the next guy doing WALInsert, and then give it back to
him afterwards).  Once you release the lock, you should stop touching
shared memory, period.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoid index rebuilds for no-rewrite ALTER TABLE ALTER TYPE

2011-07-18 Thread Robert Haas

On Thu, Jul 7, 2011 at 3:25 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jul 7, 2011 at 3:21 PM, Noah Misch n...@2ndquadrant.com wrote:
 On Thu, Jul 07, 2011 at 03:06:46PM -0400, Robert Haas wrote:
 On Thu, Jul 7, 2011 at 2:55 PM, Noah Misch n...@2ndquadrant.com wrote:
  CheckIndexCompatible() calls ComputeIndexAttrs() to resolve the new 
  operator
  classes, collations and exclusion operators for each index column.  It 
  then
  checks those against the existing values for the same.  I figured that was
  obvious enough, but do you want a new version noting that?

 I guess one question I had was... are we depending on the fact that
 ComputeIndexAttrs() performs a bunch of internal sanity checks?  Or
 are we just expecting those to always pass, and we're going to examine
 the outputs after the fact?

 Those checks can fail; consider an explicit operator class or collation that
 does not support the destination type.  At that stage, we neither rely on 
 those
 checks nor mind if they do fire.  If we somehow miss the problem at that 
 stage,
 DefineIndex() will detect it later.  Likewise, if we hit an error in
 CheckIndexCompatible(), we would also hit it later in DefineIndex().

 OK.

Committed with minor comment and documentation changes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduced power consumption in autovacuum launcher process

2011-07-18 Thread Robert Haas

On Mon, Jul 18, 2011 at 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Jul 18, 2011 at 9:12 AM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 There could perhaps be a very large nap, as determined by
 launcher_determine_sleep(), so that the total number of microseconds
 passed to WaitLatch() would exceed the maximum long size that can be
 safely represented on some or all platforms. On most 32-bit machines,
 sizeof(long) == sizeof(int), which is just 4 bytes. (2^31) - 1 =
 2,147,483,647 microseconds = only about 35 minutes. There are corner
 cases, such as if someone were to set autovacuum_naptime to something
 silly.

 OK.  In that case, my feeling is yes, you need to worry about that.
 I'm not sure exactly what the best solution is: we could either
 twiddle the WaitLatch interface some more, or restrict
 autovacuum_naptime to at most 30 minutes, or maybe there's some other
 option.

 A wakeup once every half hour would surely not be an issue from a power
 consumption standpoint.  However, I'm not sure I understand here: aren't
 we trying to remove the timeouts completely?

Well, in the case of the AV launcher, the issue is that the main loop
is timed by definition, cf. autovacuum_naptime, and the WaitLatch()
interface is apparently designed so that we can't sleep longer than 35
minutes.  We can avoid waking every second simply to check whether the
postmaster has died, but waking up every autovacuum_naptime seems de
rigeur barring a major redesign of the mechanism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

2011-07-18 Thread Yeb Havinga


Hello KaiGai-san,

I've been preparing to review this patch by reading both pgsql-hackers 
history on sepgsql, and also the RHEL 6 guide on SELinux this weekend, 
today I installed GIT HEAD with --with-selinux on Scientific Linux 6, 
developer installation, so far almost everything looking good.


These things should probably be added to the 9.1beta3 documentation branch:

1) the line with for DBNAME in ... do postgres --single etc, lacks a -D 
argument and hence gives the error:

postgres does not know where to find the server configuration file.

2) there is a dependency to objects outside of the postgresql 
installation tree in /etc/selinux/targeted/contexts/sepgsql_contexts, 
and that file has an error that is thrown when contrib/sepgsql is executed:
/etc/selinux/targeted/contexts/sepgsql_contexts:  line 33 has invalid 
object type db_blobs

(same for db_language)
I found your fix for the error on a forum on oss.tresys.com, but IMHO 
either the contrib/sepgsql should mention that the dependency exists and 
it might contain errors for (older) reference policies, or it should 
include a bugfree reference policy for sepgsql to replace the system 
installed refpolicy with (and mention that in the install documentation).


3) sepgsql is currently a bit hard to find in the documentation. 
www.postgresql.org website search doesn't find sepgsql and selinux only 
refers to an old PostgreSQL redhat bug in 2005 on bugzilla.redhat.com. I 
had to manually remember it was a contrib module. Also sepgsql isn't 
linked to at the SECURITY LABEL page. At the moment I'm unsure if I have 
seen all sepgsql related sgml-based documentation.


After fixing the refpolicy I proceeded with the contrib/sepgsql manual, 
with the goal to get something easy done, like create a top secret table 
like 'thisyearsbonusses', and a single user 'boss' and configure sepgsql 
in such a way that only the boss can access the top secret table. I've 
read the the contrib documentation, browsed links on the bottom of the 
page but until now I don't even have a clue how to proceed. Until I do 
so, I don't feel it's appropriate for me to review the avc patch.


Would you be willing to help me getting a bit started? Specific 
questions are:


1) The contrib doc under DML permissions talks about 'db_table:select' 
etc? What are these things? They are not labels since I do not see them 
listed in the output of 'select distinct label from pg_seclabel'.


2) The regression test label.sql introduces labels with types 
sepgsql_trusted_proc_exec_t, sepgsql_ro_table_t. My question is: where 
are these defined? What is their meaning? Can I define my own?


3) In the examples so far I've seen unconfined_u and system_u? Can I 
define my own?


Thanks,
Yeb Havinga


On 2011-07-14 21:46, Kohei KaiGai wrote:

Sorry, the syscache part was mixed to contrib/sepgsql part
in my previous post.
Please see the attached revision.

Although its functionality is enough simple (it just reduces
number of system-call invocation), its performance
improvement is obvious.
So, I hope someone to volunteer to review these patches.

Thanks,

2011/7/11 Kohei KaiGaikai...@kaigai.gr.jp:

I rebased the userspace access vector cache patch to the latest tree.

I'll describe the background of this patch because this thread has not been
active more than a week.
The sepgsql asks in-kernel selinux when it needs to make its access control
decison, so it always causes system call invocations.
However, access control decision of selinux for a particular pair of security
label is consistent as long as its security policy is not reloaded.
Thus, it is a good idea to cache access control decisions recently used in
userspace.
In addition, current GetSecurityLabel() always open pg_seclabel catalog and
scan to fetch security label of database objects, although it is a situation we
can utilize syscache mechanism.

The uavc-syscache patch adds a new SECLABELOID syscache.
It also redefine pg_seclabel.provide as Name, instead of Text, according to
the suggestion from Tom.
(To avoid collation conscious datatype)

The uavc-selinux-cache patch adds cache mechanism of contrib/sepgsql.
Its internal api to communicate with selinux (sepgsql_check_perms) was
replaced by newer sepgsql_avc_check_perms that checks cached access
control decision at first, prior to system call invocations.

The result of performance improvement is obvious.

* Test 2. time to run 50,000 of SELECT from empty tables
  selinux | SECLABELOID syscache
  avc   |   without |   with
-+---
  without | 185.59[s] | 178.38[s]
-+---
  with|  23.58[s] |  21.79[s]
-+---

I strongly hope this patch (and security label support on shared objects) to
get unstreamed  in this commit-fest, because it will perform as a basis of
other upcoming features.
Please volunteer the reviewing!

Thanks,

2011/7/2 Kohei KaiGaikai...@kaigai.gr.jp:

The attached patch re-defines

Re: [HACKERS] Reduced power consumption in autovacuum launcher process

2011-07-18 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Jul 18, 2011 at 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 A wakeup once every half hour would surely not be an issue from a power
 consumption standpoint.  However, I'm not sure I understand here: aren't
 we trying to remove the timeouts completely?

 Well, in the case of the AV launcher, the issue is that the main loop
 is timed by definition, cf. autovacuum_naptime, and the WaitLatch()
 interface is apparently designed so that we can't sleep longer than 35
 minutes.

Hmm.  Well, it's not too late to rethink the WaitLatch API, if we think
that that might be a significant limitation.  Offhand I don't see that
it is, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FOR KEY LOCK foreign keys

2011-07-18 Thread Kevin Grittner

Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Noah Misch  wrote:
   
 With this patch in its final form, I have completed 180+ suite
 runs without a failure.
   
 The attached patch allows the tests to pass when
 default_transaction_isolation is stricter than 'read committed'. 
 
Without these two patches the tests fail about one time out of three
on my machine at the office at the 'read committed' transaction
isolation level, and all the time at stricter levels.  On my machine
at home I haven't seen the failures at 'read committed'.  I don't
know if this is Intel (at work) versus AMD (at home) or what.
 
With both Noah's patch and mine I haven't yet seen a failure in
either environment, with a few dozen tries..
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Made ecpglib write double with a precision of 15 digits.

2011-07-18 Thread Tom Lane

I wrote:
 Michael Meskes mes...@postgresql.org writes:
 Made ecpglib write double with a precision of 15 digits.
 Patch originally by Akira Kurosawa kurosawa-ak...@mxc.nes.nec.co.jp.

 I think you missed the 9.1 branch.

... not to mention that the buildfarm is now failing ecpg-check
everywhere ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Made ecpglib write double with a precision of 15 digits.

2011-07-18 Thread Michael Meskes

On Mon, Jul 18, 2011 at 12:16:50PM -0400, Tom Lane wrote:
 ... not to mention that the buildfarm is now failing ecpg-check
 everywhere ...

I seem to have messed up my local make check call, sorry, working on the fix.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Fast GiST index build

2011-07-18 Thread Alexander Korotkov

Hi!

New version of patch is attached. There are following changes.
1) Since proposed tchnique is not always a fast build, it was renamed
everywhere in the patch to buffering build.
2) Parameter buffering now has 3 possible values yes, no and auto.
auto means automatic switching from regular index build to buffering one.
Currently it just switch when index size exceeds maintenance_work_mem.
3) Holding of many buffers pinned is avoided.
4) Rebased with head.

TODO:
1) Take care about ordered datasets in automatic switching.
2) Take care about concurrent backends in automatic switching.

--
With best regards,
Alexander Korotkov.


gist_fast_build-0.7.0.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: new contrib module plpgsql's embeded sql validator

2011-07-18 Thread Alvaro Herrera

Excerpts from Jim Nasby's message of dom jul 17 16:31:45 -0400 2011:

 On a somewhat related note, I'd also really like to have the ability to parse 
 things like .sql files externally, to do things like LINT checking.

We talked about this during PGCon.  The idea that seemed to have
consensues there was to export the parser similarly to how we build the
ecpg parser, that is, a set of perl scripts which take our gram.y as
input and modify it to emit something different.  What ecpg does with it
is emit a different grammar, but it doesn't seem impossible to me to
have it emit some sort of (lint) checker.

I admit I haven't looked at the new Perl scripts we use in ecpg (they
were rewritten in the 9.1 era).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-07-18 Thread Pavel Stehule

Hello Tom,

Thank you for review

I am thinking, so your comment is clean and I'll respect it in new version.

There is only one issue, that should be solved first. I introduced non
standard diagnostics field column_names, because there is not
possible get column_name value for check constraints now.  A correct
implementation of COLUMN_NAME field needs a explicit relation between
pg_constraint and pg_attribute - maybe implemented as new column to
pg_constraint. Do you agree?

Regards

Pavel



2011/7/16 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 I am sending a updated patch

 I looked over this patch a bit.  I guess my main concern about it
 is that the set of items to be reported seems to have been made up on
 a whim.  I think that we ought to follow the SQL standard, which has a
 pretty clearly defined set of additional information items --- look at
 the spec for the GET DIAGNOSTICS statement.  (In SQL:2008, this is
 section 23.1 get diagnostics statement.)  I don't feel that we need to
 implement every field the standard calls for, at least not right away,
 but we ought to have their list in mind.  Conversely, implementing items
 that *aren't* listed in the spec has to meet a considerably higher bar
 than somebody just submitting a patch that does it.

 The standard information items that seem reasonable for us to implement
 in the near future are

        COLUMN_NAME
        CONSTRAINT_NAME
        CONSTRAINT_SCHEMA
        SCHEMA_NAME
        TABLE_NAME
        TRIGGER_NAME
        TRIGGER_SCHEMA

 So I'd like to see the patch revised to use this terminology.  We
 probably also need to think a bit harder about the PG_DIAG_XXX code
 letters to be used --- we're already just about at the limit of what
 fields can have reasonably-mnemonic code letters, and not all of the
 above have obvious choices, let alone the rest of what's in the spec
 that we might someday want to implement.  What assignment rule should
 we use when we can't choose a mnemonic letter?




 Some other specific comments on the patch follow:

 1. It's way short in the documentation department.  protocol.sgml
 certainly needs additions (see Error and Notice Message Fields),
 also libpq.sgml's discussion of PQresultErrorField(), also
 sources.sgml's Reporting Errors Within the Server, and I'm not
 sure where else.


ok

 2. I think you could drop the tuple-descriptor changes, because they're
 only needed in service of an information item that is not found in the
 standard and doesn't seem very necessary.  The standard says to report
 the name of the constraint, not what columns it involves.

 3. errrel() is extremely poorly considered.  The fact that it requires
 utils/relcache.h to be #included by elog.h (and therefore by *every*
 *single* *file* in the backend) is a symptom of that, but expecting
 elog.c to do catalog lookups is as bad or worse from a modularity
 standpoint.  I think all the added elog functions should not take
 anything higher-level than a C string.

 4. Actually, it would probably be a good idea to avoid inventing a new
 elog API function for each individual new information item; something
 along the lines of erritem(PG_DIAG_WHATEVER, string_value) would be
 more appropriate to cover the inevitable future expansions.

 5. I don't think IndexRelationGetParentRelation is very appropriate
 either --- in the use cases you have, the parent table's OID is easily
 accessible, as is its namespace (which'll be the same as the index's)
 and so you could just have the callers do get_rel_name(tableoid).
 Doing a relcache open in an error reporting path seems like overkill.

 I'm going to mark this patch Returned With Feedback.

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: enhanced get diagnostics statement 2

2011-07-18 Thread Tom Lane

Pavel Stehule pavel.steh...@gmail.com writes:
 2011/7/14 Alvaro Herrera alvhe...@commandprompt.com:
 Thanks ... I expect you're going to resubmit the patch based on David's
 last version and my comments?

 yes, see a attachment

Applied with some editorial adjustments.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Single pass vacuum - take 1

2011-07-18 Thread Pavan Deolasee

On Mon, Jul 18, 2011 at 3:14 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On Mon, Jul 18, 2011 at 2:20 AM, Robert Haas robertmh...@gmail.com
 wrote:
  On Fri, Jul 15, 2011 at 7:23 AM, Simon Riggs si...@2ndquadrant.com
 wrote:
  My additional requests would be that we can easily tell which blocks
  have been modified like this, that we have a way to turn this off if
  we get bugs for next few releases, that we check it all works with Hot
  Standby just fine and that all the block inspection tools support it.
 
  To me, this seems like a bit too much of an internal change to justify
  adding a GUC.

 I will be happy to remove it again when we have shown there are no
 bugs getting this wrong is a data loss issue.


Though I understand the fear for data loss, do we have much precedent of
adding GUC to control such mechanism ? Even for complex feature like HOT we
did not add any GUC to turn it off and I don't think we missed it. So I
would suggest we review the code and test the feature extensively and fix
the bugs if any, but lets not add any GUC to turn it off.  In fact, the code
and algorithm itself is not that complex and I would suggest you to take a
look at the patch.


  But it probably is a good idea to ping the authors of
  the various block inspection tools -- does contrib/pageinspect care
  about this sort of thing, or just the out-of-core stuff?

 I'm the author of the main sections of the pageinspect module, and
 yes, I care. :-)


Yeah, we should probably teach pageinspect to see and report this additional
information.

Thanks,
Pavan



 --
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] patch for 9.2: enhanced errors

2011-07-18 Thread Tom Lane

Pavel Stehule pavel.steh...@gmail.com writes:
 There is only one issue, that should be solved first. I introduced non
 standard diagnostics field column_names, because there is not
 possible get column_name value for check constraints now.  A correct
 implementation of COLUMN_NAME field needs a explicit relation between
 pg_constraint and pg_attribute - maybe implemented as new column to
 pg_constraint. Do you agree?

No, I don't.  You're adding complication to solve a problem that doesn't
need to be solved.  The standard says to return the name of the
constraint for a constraint-violation failure.  It does not say anything
about naming the associated column(s).  COLUMN_NAME is only supposed to
be defined for certain kinds of errors, and this isn't one of them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Initial Review: JSON contrib modul was: Re: [HACKERS] Another swing at JSON

2011-07-18 Thread Robert Haas

On Fri, Jul 15, 2011 at 3:56 PM, Joey Adams joeyadams3.14...@gmail.com wrote:
 On Mon, Jul 4, 2011 at 10:22 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 I'll try to submit a revised patch within the next couple days.

 Sorry this is later than I said.

 I addressed the issues covered in the review.  I also fixed a bug
 where \u0022 would become , which is invalid JSON, causing an
 assertion failure.

 However, I want to put this back into WIP for a number of reasons:

  * The current code accepts invalid surrogate pairs (e.g.
 \uD800\uD800).  The problem with accepting them is that it would be
 inconsistent with PostgreSQL's Unicode support, and with the Unicode
 standard itself.  In my opinion: as long as the server encoding is
 universal (i.e. UTF-8), decoding a JSON-encoded string should not fail
 (barring data corruption and resource limitations).

  * I'd like to go ahead with the parser rewrite I mentioned earlier.
 The new parser will be able to construct a parse tree when needed, and
 it won't use those overkill parsing macros.

  * I recently learned that not all supported server encodings can be
 converted to Unicode losslessly.  The current code, on output,
 converts non-ASCII characters to Unicode escapes under some
 circumstances (see the comment above json_need_to_escape_unicode).

 I'm having a really hard time figuring out how the JSON module should
 handle non-Unicode character sets.  \u escapes in JSON literals
 can be used to encode characters not available in the server encoding.
  On the other hand, the server encoding can encode characters not
 present in Unicode (see the third bullet point above).  This means
 JSON normalization and comparison (along with member lookup) are not
 possible in general.

I previously suggested that, instead of trying to implement JSON, you
should just try to implement
JSON-without-the-restriction-that-everything-must-be-UTF8.  Most
people are going to be using UTF-8 simply because it's the default,
and if you forget about transcoding then ISTM that this all becomes a
lot simpler.  We don't, in general, have the ability to support data
in multiple encodings inside PostgreSQL, and it seems to me that by
trying to invent a mechanism for making that work as part of this
patch, you are setting the bar for yourself awfully high.

One thing to think about here is that transcoding between UTF-8 and
the server encoding seems like the wrong thing all around.  After all,
the user does not want the data in the server encoding; they want it
in their chosen client encoding.  If you are transcoding between UTF-8
and the server encoding, then that suggests that there's some
double-transcoding going on here, which creates additional
opportunities for (1) inefficiency and (2) outright failure.  I'm
guessing that's because you're dealing with an interface that expects
the internal representation of the datum on one side and the server
encoding on the other side, which gets back to the point in the
preceding paragraph.  You'd probably need to revise that interface in
order to make this really work the way it should, and that might be
more than you want to get into.  At any rate, it probably is a
separate project from making JSON work.

If in spite of the above you're bent on continuing down your present
course, then it seems to me that you'd better make the on-disk
representation UTF-8, with all \u escapes converted to the
corresponding characters.  If you hit an invalid surrogate pair, or a
character that exists in the server encoding but not UTF-8, it's not a
legal JSON object and you throw an error on input, just as you would
for mismatched braces or similar.  On output, you should probably just
use \u to represent any unrepresentable characters - i.e. option 3
from your original list.  That may be slow, but I think that it's not
really worth devoting a lot of mental energy to this case.  Most
people are going to be using UTF-8 because that's the default, and
those who are not shouldn't expect a data format built around UTF-8 to
work perfectly in their environment, especially if they insist on
using characters that are representable in only some of the encodings
they are using.

But, again, why not just forget about transcoding and define it as
JSON, if you happen to be using utf-8 as the server encoding, and
otherwise some variant of JSON that uses the server encoding as its
native format?.  It seems to me that that would be a heck of a lot
simpler and more reliable, and I'm not sure it's any less useful in
practice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduced power consumption in autovacuum launcher process

2011-07-18 Thread Heikki Linnakangas


On 18.07.2011 18:32, Tom Lane wrote:

Robert Haasrobertmh...@gmail.com  writes:

On Mon, Jul 18, 2011 at 10:35 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

A wakeup once every half hour would surely not be an issue from a power
consumption standpoint.  However, I'm not sure I understand here: aren't
we trying to remove the timeouts completely?



Well, in the case of the AV launcher, the issue is that the main loop
is timed by definition, cf. autovacuum_naptime, and the WaitLatch()
interface is apparently designed so that we can't sleep longer than 35
minutes.


Hmm.  Well, it's not too late to rethink the WaitLatch API, if we think
that that might be a significant limitation.


Right, we can easily change the timeout argument to be in milliseconds 
instead of microseconds.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-07-18 Thread Josh Berkus

Tom,

 No, I don't.  You're adding complication to solve a problem that doesn't
 need to be solved.  The standard says to return the name of the
 constraint for a constraint-violation failure.  It does not say anything
 about naming the associated column(s).  COLUMN_NAME is only supposed to
 be defined for certain kinds of errors, and this isn't one of them.

Are we talking about FK constraints here, or CHECK contstraints?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] per-column generic option

2011-07-18 Thread Robert Haas

On Mon, Jul 11, 2011 at 12:11 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of dom jul 10 21:21:19 -0400 2011:
 On Jul 9, 2011, at 10:49 PM, Alvaro Herrera alvhe...@commandprompt.com 
 wrote:
  In short: in my opinion, attoptions and attfdwoptions need to be one
  thing and the same.

 I feel the opposite. In particular, what happens when a future release of 
 PostgreSQL adds an attoption that happens to have the same name as 
 somebody's per-column FDW option?  Something breaks, that's what...

 Hmm, if you follow my proposal above, that wouldn't actually happen,
 because the core options do not apply to foreign columns.

Well, not at the moment.  But I think it's altogether likely that we
might want them to in the future.  The foreign data wrapper support we
have right now is basically a stub until we get around to improving
it, so we don't (for example) analyze foreign tables, which means that
n_distinct is not relevant.  But that's something we presumably want
to change at some point.  Eventually, I would anticipate that we'll
have quite a few more column options and most will apply to both
tables and foreign tables, so I'm not keen to bake in something that
makes that potentially problematic.  I think we should understand
attoptions as things that modify the behavior of PostgreSQL, while
attfdw/genoptions are there solely for the foreign data wrapper to
use.  An extra nullable field in pg_attribute isn't costing us
anything non-trivial, and the syntactic and definitional clarity seems
entirely worth it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduced power consumption in autovacuum launcher process

2011-07-18 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 18.07.2011 18:32, Tom Lane wrote:
 Hmm.  Well, it's not too late to rethink the WaitLatch API, if we think
 that that might be a significant limitation.

 Right, we can easily change the timeout argument to be in milliseconds 
 instead of microseconds.

On the whole I'd be more worried about giving up the shorter waits than
the longer ones --- it's not too hard to imagine using submillisecond
timeouts in the future, as machines get faster.  If we really wanted to
fix this, I think we need to move to a wider datatype.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Initial Review: JSON contrib modul was: Re: [HACKERS] Another swing at JSON

2011-07-18 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Fri, Jul 15, 2011 at 3:56 PM, Joey Adams joeyadams3.14...@gmail.com 
 wrote:
 I'm having a really hard time figuring out how the JSON module should
 handle non-Unicode character sets.

 But, again, why not just forget about transcoding and define it as
 JSON, if you happen to be using utf-8 as the server encoding, and
 otherwise some variant of JSON that uses the server encoding as its
 native format?.  It seems to me that that would be a heck of a lot
 simpler and more reliable, and I'm not sure it's any less useful in
 practice.

Right offhand, the only argument I can see against this is that we might
someday want to pass JSON datums directly to third-party loadable
libraries that are picky about UTF8-ness.  But we could just document
and/or enforce that such libraries can only be used in UTF8-encoded
databases.

BTW, could the \u problem be finessed by leaving such escapes in
source form?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-07-18 Thread Tom Lane

Josh Berkus j...@agliodbs.com writes:
 Tom,
 No, I don't.  You're adding complication to solve a problem that doesn't
 need to be solved.  The standard says to return the name of the
 constraint for a constraint-violation failure.  It does not say anything
 about naming the associated column(s).  COLUMN_NAME is only supposed to
 be defined for certain kinds of errors, and this isn't one of them.

 Are we talking about FK constraints here, or CHECK contstraints?

Either one.  They both have the potential to reference more than one
column, so if the committee had meant errors to try to identify the
referenced columns, they'd have put something other than COLUMN_NAME
into the standard.  They didn't.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] per-column generic option

2011-07-18 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 ... I think we should understand
 attoptions as things that modify the behavior of PostgreSQL, while
 attfdw/genoptions are there solely for the foreign data wrapper to
 use.  An extra nullable field in pg_attribute isn't costing us
 anything non-trivial, and the syntactic and definitional clarity seems
 entirely worth it.

+1.  We paid the price of allowing nullable columns in pg_attribute long
ago.  One more isn't going to cost anything, especially since virtually
every row in that catalog already contains at least one null.

I'm not too thrilled with the terminology of generic options, though.
I think this should be understood as specifically FDW-owned options.
If the column isn't reserved for the use of the FDW, then you get right
back into the problem of who's allowed to use it and what if there's a
collision.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduced power consumption in autovacuum launcher process

2011-07-18 Thread k...@rice.edu

On Mon, Jul 18, 2011 at 03:12:20PM -0400, Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  On 18.07.2011 18:32, Tom Lane wrote:
  Hmm.  Well, it's not too late to rethink the WaitLatch API, if we think
  that that might be a significant limitation.
 
  Right, we can easily change the timeout argument to be in milliseconds 
  instead of microseconds.
 
 On the whole I'd be more worried about giving up the shorter waits than
 the longer ones --- it's not too hard to imagine using submillisecond
 timeouts in the future, as machines get faster.  If we really wanted to
 fix this, I think we need to move to a wider datatype.
 
   regards, tom lane
 

You could also tag the high bit to allow you to encode larger ranges
by having microseconds for the values with the high bit = 0 and use
milliseconds for the values with the high bit = 1. Then you could
have the best of both without punching up the width of the datatype.

Regard,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] per-column generic option

2011-07-18 Thread Robert Haas

On Mon, Jul 18, 2011 at 3:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 ... I think we should understand
 attoptions as things that modify the behavior of PostgreSQL, while
 attfdw/genoptions are there solely for the foreign data wrapper to
 use.  An extra nullable field in pg_attribute isn't costing us
 anything non-trivial, and the syntactic and definitional clarity seems
 entirely worth it.

 +1.  We paid the price of allowing nullable columns in pg_attribute long
 ago.  One more isn't going to cost anything, especially since virtually
 every row in that catalog already contains at least one null.

 I'm not too thrilled with the terminology of generic options, though.
 I think this should be understood as specifically FDW-owned options.
 If the column isn't reserved for the use of the FDW, then you get right
 back into the problem of who's allowed to use it and what if there's a
 collision.

I concur.  The SQL/MED standard is welcome to refer to them as generic
options, but at least FTTB they are going to be entirely for FDWs in
our implementation, and naming them that way is therefore a Good
Thing.  If the SQL committee decides to use them in other places and
we choose to support that in some future release for some
as-yet-unclear purpose, well, it won't be the first time we've
modified the system catalog schema.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-07-18 Thread Pavel Stehule

2011/7/18 Tom Lane t...@sss.pgh.pa.us:
 Josh Berkus j...@agliodbs.com writes:
 Tom,
 No, I don't.  You're adding complication to solve a problem that doesn't
 need to be solved.  The standard says to return the name of the
 constraint for a constraint-violation failure.  It does not say anything
 about naming the associated column(s).  COLUMN_NAME is only supposed to
 be defined for certain kinds of errors, and this isn't one of them.

 Are we talking about FK constraints here, or CHECK contstraints?

 Either one.  They both have the potential to reference more than one
 column, so if the committee had meant errors to try to identify the
 referenced columns, they'd have put something other than COLUMN_NAME
 into the standard.  They didn't.

Personally, I see a sense for COLUMN_NAME field only with relation to
CHECK_CONSTRAINT - for any other constraint using a COLUMN_NAME is
based on parsing a constraint rule - and I don't believe so the
standard is based in it. Column check constraint is attached
explicitly to one column - but this relation should not be based on
semantic.

We can check DB2 implementation.

Regards
Pavel


                        regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Initial Review: JSON contrib modul was: Re: [HACKERS] Another swing at JSON

2011-07-18 Thread Robert Haas

On Mon, Jul 18, 2011 at 3:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Jul 15, 2011 at 3:56 PM, Joey Adams joeyadams3.14...@gmail.com 
 wrote:
 I'm having a really hard time figuring out how the JSON module should
 handle non-Unicode character sets.

 But, again, why not just forget about transcoding and define it as
 JSON, if you happen to be using utf-8 as the server encoding, and
 otherwise some variant of JSON that uses the server encoding as its
 native format?.  It seems to me that that would be a heck of a lot
 simpler and more reliable, and I'm not sure it's any less useful in
 practice.

 Right offhand, the only argument I can see against this is that we might
 someday want to pass JSON datums directly to third-party loadable
 libraries that are picky about UTF8-ness.  But we could just document
 and/or enforce that such libraries can only be used in UTF8-encoded
 databases.

Right.  Or, if someone someday implements a feature to allow multiple
server encodings within the same database, then we can tell people to
use it if they want JSON to work in a spec-canonical way.

 BTW, could the \u problem be finessed by leaving such escapes in
 source form?

Joey seems to want to canonicalize the input, which argues against
that, and to detect invalid surrogate pairs, which likewise argues
against that.  You could argue that's overkill, but it seems to have
at least some merit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduced power consumption in autovacuum launcher process

2011-07-18 Thread Robert Haas

On Mon, Jul 18, 2011 at 3:28 PM, k...@rice.edu k...@rice.edu wrote:
 On Mon, Jul 18, 2011 at 03:12:20PM -0400, Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  On 18.07.2011 18:32, Tom Lane wrote:
  Hmm.  Well, it's not too late to rethink the WaitLatch API, if we think
  that that might be a significant limitation.

  Right, we can easily change the timeout argument to be in milliseconds
  instead of microseconds.

 On the whole I'd be more worried about giving up the shorter waits than
 the longer ones --- it's not too hard to imagine using submillisecond
 timeouts in the future, as machines get faster.  If we really wanted to
 fix this, I think we need to move to a wider datatype.

                       regards, tom lane


 You could also tag the high bit to allow you to encode larger ranges
 by having microseconds for the values with the high bit = 0 and use
 milliseconds for the values with the high bit = 1. Then you could
 have the best of both without punching up the width of the datatype.

Considering that we're just talking about a function call here, and
not something that ever goes out to disk, that seems like entirely too
much notational complexity.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-07-18 Thread Tom Lane

Pavel Stehule pavel.steh...@gmail.com writes:
 2011/7/18 Tom Lane t...@sss.pgh.pa.us:
 Are we talking about FK constraints here, or CHECK contstraints?

 Either one.  They both have the potential to reference more than one
 column, so if the committee had meant errors to try to identify the
 referenced columns, they'd have put something other than COLUMN_NAME
 into the standard.  They didn't.

 Personally, I see a sense for COLUMN_NAME field only with relation to
 CHECK_CONSTRAINT - for any other constraint using a COLUMN_NAME is
 based on parsing a constraint rule - and I don't believe so the
 standard is based in it.

Read the standard.  COLUMN_NAME is defined for use only in
syntax_error_or_access_rule_violation errors that relate to a specific
column.  In fact, the spec is written as (SQL:2008 23.1 GR 4-h-ii):

If the syntax error or access rule violation was for an inaccessible
column, then the value of COLUMN_NAME is the column name of that
column. Otherwise, the value of COLUMN_NAME is a zero-length string.

which suggests that it might be meant *only* for use with
INSUFFICIENT_PRIVILEGE errors that are thrown due to a column ACL.
We can probably extend that to some other syntax errors, like unknown
column or wrong datatype or what have you, but there is nothing here to
suggest that we have to force the issue for errors that don't naturally
relate to exactly one column.  And CHECK constraints don't.  Consider
CHECK (f1  f2).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest Status: Sudden Death Overtime

2011-07-18 Thread Robert Haas

On Fri, Jul 15, 2011 at 5:05 PM, Josh Berkus j...@agliodbs.com wrote:
 As you can probably tell, we are not ready to end the commitfest.  (I
 think Robert gave me this CF to show why even talking about a one-week
 mini-fest is a fantasy.  If so, successful.).

Heh, heh.  Well, it wasn't that premeditated, but I'm always glad to
reach a meeting of the minds.  :-)

 These are complex and need review by advanced hackers.  They are also
 often interdependant.  As such, I'd like to automatically move his
 patches to the next CF and ask that hackers who are engaged in working
 on them continue to do so between CFs.

There are only two patches left and I think we really ought to try to
take a crack at doing something with them.  Yeb is working on the
userspace access vector cache patch, which I think is going drag on
longer than we want keep the CommitFest open, but I'm OK with giving
it a day or two to shake out.  Aside from the benefit of reviewing the
actual patch, I see that Yeb is pushing on KaiGai to do further work
on the documentation, which I think is of great value.

I will review the other patch - on shared object labels - RSN.

 PATCHES WHICH I MOVED TO THE NEXT CF 9-2011:

 * Collect frequency statistics and selectivity estimation for arrays
 * Allow multiple Postgres clusters running on the same machine to
 distinguish themselves in the event log
 * lazy vxid locks

I'm not clear on your criteria for moving these patches, as they seem
to be in somewhat different states.  The first one is WIP, and it
doesn't really matter whether you move it to the next CommitFest or
just mark it returned with feedback.  There is active development work
going on there, so it's not going to get committed at this point.  But
the other two are basically just things we didn't get around to
reviewing.

With respect to the lazy vxid lock patch, it looks to me like Jeff has
done a bit of review, and I think the real question here is whether or
not we want to go ahead and make that change (with some stylistic and
cosmetic corrections) or wait until we have a more complex solution to
the spinlock contention problems mapped out.  On a pgbench run with 8
clients on a 32-core machine, I see about a 2% speedup from that patch
on pgbench -S, and it grows to 8% at 32 clients.  At 80 clients (but
still just a 32-core box), the results bounce around more, but taking
the median of three five-minute runs, it's an 11% improvement.  To me,
that's enough to make it worth applying, especially considering that
what is 11% on today's master is, in raw TPS, equivalent to maybe 30%
of yesterday's master (prior to the fast relation lock patch being
applied).  More, it seems pretty clear that this is the conceptually
right thing to do, even if it's going to require some work elsewhere
to file down all the rough edges thus exposed.  If someone objects to
that, then OK, we should talk about that: but so far I don't think
anyone has expressed strong opposition: in which case I'd like to fix
it up and get it in.

With respect to the event-log patch, MauMau has resubmitted that to
the next CommitFest, so that's fine as far as it goes.  But it might
make sense to see if we can twist Magnus's arm into re-reviewing it
now while it's fresh in his mind, rather than waiting another two or
three months.

 PATCHES WHICH HAVE BEEN UPDATED AND NEED FURTHER REVIEW:

 * Cascade Replication

No update.

 PATCHES STILL UNDER ACTIVE DISCUSSION:

 * Add ability to constrain backend temporary file space

Now committed, by Tom.

 PATCHES I CAN'T FIGURE OUT THE STATUS OF:

 * pg_comments system view

I reviewed this and marked it Returned with Feedback.

 * Bugfix for XPATH() if expression returns a scalar value

No update.

 So, down to a fairly limited list, except that we need a lot of committing.

We're down to three patches that fall into this category: two XML
patches by Florian, which have been somewhat derailed by an argument
about whether values of type xml should, in fact, be required to be
valid xml (I think you know my vote on this one...); and an
error-reporting patch that Tom weighed in on over the weekend.  This
last suffers from the issue that it's not quite clear whether Tom is
going to do the work or whether he's expecting the submitter to do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-07-18 Thread Pavel Stehule

2011/7/18 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 2011/7/18 Tom Lane t...@sss.pgh.pa.us:
 Are we talking about FK constraints here, or CHECK contstraints?

 Either one.  They both have the potential to reference more than one
 column, so if the committee had meant errors to try to identify the
 referenced columns, they'd have put something other than COLUMN_NAME
 into the standard.  They didn't.

 Personally, I see a sense for COLUMN_NAME field only with relation to
 CHECK_CONSTRAINT - for any other constraint using a COLUMN_NAME is
 based on parsing a constraint rule - and I don't believe so the
 standard is based in it.

 Read the standard.  COLUMN_NAME is defined for use only in
 syntax_error_or_access_rule_violation errors that relate to a specific
 column.  In fact, the spec is written as (SQL:2008 23.1 GR 4-h-ii):

        If the syntax error or access rule violation was for an inaccessible
        column, then the value of COLUMN_NAME is the column name of that
        column. Otherwise, the value of COLUMN_NAME is a zero-length string.

 which suggests that it might be meant *only* for use with
 INSUFFICIENT_PRIVILEGE errors that are thrown due to a column ACL.
 We can probably extend that to some other syntax errors, like unknown
 column or wrong datatype or what have you, but there is nothing here to
 suggest that we have to force the issue for errors that don't naturally
 relate to exactly one column.  And CHECK constraints don't.  Consider
 CHECK (f1  f2).


ok, this is relative clean, but

so for example, NULL or DOMAIN constraints doesn't affect a
COLUMN_NAME? These constraints has no name.

regards

Pavel

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest Status: Sudden Death Overtime

2011-07-18 Thread Simon Riggs

On Fri, Jul 15, 2011 at 10:05 PM, Josh Berkus j...@agliodbs.com wrote:

 PATCHES WHICH HAVE BEEN UPDATED AND NEED FURTHER REVIEW:

 * Cascade Replication

Sorry, too busy reviewing to take note of this. I expect to commit
when its tested and ready.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest Status: Sudden Death Overtime

2011-07-18 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 We're down to three patches that fall into this category: two XML
 patches by Florian, which have been somewhat derailed by an argument
 about whether values of type xml should, in fact, be required to be
 valid xml (I think you know my vote on this one...);

Yeah, it's not clear to me either which of those are still reasonable
candidates for committing as-is.

 and an
 error-reporting patch that Tom weighed in on over the weekend.  This
 last suffers from the issue that it's not quite clear whether Tom is
 going to do the work or whether he's expecting the submitter to do it.

If you mean the business about allowing GUCs in postgresql.conf to be
applied even if there are semantic errors elsewhere, I'm just as happy
to let Alexey or Florian have a go at it first, if they want.  The real
question at the moment is do we have consensus about changing that?
Because if we do, the submitted patch is certainly not something to
commit as-is, and should be marked Returned With Feedback.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

2011-07-18 Thread Kohei KaiGai

Yeb, Thanks for your volunteering.

2011/7/18 Yeb Havinga yebhavi...@gmail.com:
 Hello KaiGai-san,

 I've been preparing to review this patch by reading both pgsql-hackers
 history on sepgsql, and also the RHEL 6 guide on SELinux this weekend, today
 I installed GIT HEAD with --with-selinux on Scientific Linux 6, developer
 installation, so far almost everything looking good.

The Scientific Linux 6 is not suitable, because its libselinux version
is a bit older
than this patch expects (libselinux-2.0.99 or later).
My recommendation is Fedora 15, instead.

 1) the line with for DBNAME in ... do postgres --single etc, lacks a -D
 argument and hence gives the error:
 postgres does not know where to find the server configuration file.

OK. I intended users to adjust their own paths (including -D option),
but an explicit -D /path/to/database seems to me more helpful as
an example.
I'll submit a patch in a separate thread.

 2) there is a dependency to objects outside of the postgresql installation
 tree in /etc/selinux/targeted/contexts/sepgsql_contexts, and that file has
 an error that is thrown when contrib/sepgsql is executed:
 /etc/selinux/targeted/contexts/sepgsql_contexts:  line 33 has invalid object
 type db_blobs
 (same for db_language)
 I found your fix for the error on a forum on oss.tresys.com, but IMHO either
 the contrib/sepgsql should mention that the dependency exists and it might
 contain errors for (older) reference policies, or it should include a
 bugfree reference policy for sepgsql to replace the system installed
 refpolicy with (and mention that in the install documentation).

It is not an error, but just a notification to inform users that
sepgsql_contexts
file contains invalid lines. It is harmless, so we can ignore them.
I don't think sepgsql.sgml should mention about this noise, because it purely
come from the problem in libselinux and refpolicy; these are external packages
from viewpoint of PostgreSQL.

 3) sepgsql is currently a bit hard to find in the documentation.
 www.postgresql.org website search doesn't find sepgsql and selinux only
 refers to an old PostgreSQL redhat bug in 2005 on bugzilla.redhat.com. I had
 to manually remember it was a contrib module. Also sepgsql isn't linked to
 at the SECURITY LABEL page. At the moment I'm unsure if I have seen all
 sepgsql related sgml-based documentation.

Improvement of documentation is an issue.
The wiki.postgresql.org should be an appropriate place, maybe.

The reason why SECURITY LABEL does not point to sepgsql.sgml is
that it is general purpose infrastructure for all the upcoming label based
security mechanism, not only sepgsql.
It was our consensus in v9.1 development.

 After fixing the refpolicy I proceeded with the contrib/sepgsql manual, with
 the goal to get something easy done, like create a top secret table like
 'thisyearsbonusses', and a single user 'boss' and configure sepgsql in such
 a way that only the boss can access the top secret table. I've read the the
 contrib documentation, browsed links on the bottom of the page but until now
 I don't even have a clue how to proceed. Until I do so, I don't feel it's
 appropriate for me to review the avc patch.

At least, you don't need to fix the policy stuff anything.

The point of this patch is replacement of existing mechanism (that always
asks in-kernel selinux with system-call invocation) by a smart caching
mechanism (it requires minimum number of system-call invocation) without
any user-visible changes.
So, it is not necessary to define a new policy for testing.

 Would you be willing to help me getting a bit started? Specific questions
 are:

 1) The contrib doc under DML permissions talks about 'db_table:select' etc?
 What are these things? They are not labels since I do not see them listed in
 the output of 'select distinct label from pg_seclabel'.

The security label is something like user-id or ownership/object-acl in the
default database access controls. It checks a relationship between user-id
and ownership/object-acl of the target object. If this relationship allowed
particular actions like 'select', 'update' or others, it shall be allowed when
user requires these actions.
In similar way, 'db_table:select' is a type of action; 'select' on table object,
not an identifier of user or objects.
SELinux defines a set of allowed actions (such as 'db_table:select') between
a particular pair of security labels (such as 'staff_t' and 'sepgsql_table_t').
The pg_seclabel holds only security label of object being referenced.
So, you should see /selinux/class/db_*/perms to see list of permissions
defined in the security policy (but limited number of them are in use, now).

 2) The regression test label.sql introduces labels with types
 sepgsql_trusted_proc_exec_t, sepgsql_ro_table_t. My question is: where are
 these defined? What is their meaning? Can I define my own?

The system's default security policy (selinux-policy package) defines all the
necessary labeles, and access

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

2011-07-18 Thread Robert Haas

On Mon, Jul 18, 2011 at 4:21 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 3) sepgsql is currently a bit hard to find in the documentation.
 www.postgresql.org website search doesn't find sepgsql and selinux only
 refers to an old PostgreSQL redhat bug in 2005 on bugzilla.redhat.com. I had
 to manually remember it was a contrib module. Also sepgsql isn't linked to
 at the SECURITY LABEL page. At the moment I'm unsure if I have seen all
 sepgsql related sgml-based documentation.

 Improvement of documentation is an issue.
 The wiki.postgresql.org should be an appropriate place, maybe.

 The reason why SECURITY LABEL does not point to sepgsql.sgml is
 that it is general purpose infrastructure for all the upcoming label based
 security mechanism, not only sepgsql.
 It was our consensus in v9.1 development.

Actually, I think it's that way mostly because we committed the
SECURITY LABEL stuff first.  I'd be in favor of adding some kind of
cross-link.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-07-18 Thread Tom Lane

Pavel Stehule pavel.steh...@gmail.com writes:
 so for example, NULL or DOMAIN constraints doesn't affect a
 COLUMN_NAME? These constraints has no name.

Well, the executor's NOT NULL tests could certainly be extended to emit
COLUMN_NAME --- I don't see any logical or implementation problem with
that, even if it seems to be outside the scope of what the standard says
to use the field for.  But let's not get into modifying the system
catalogs to produce error fields that are not required by the standard.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

2011-07-18 Thread Kohei KaiGai

2011/7/18 Robert Haas robertmh...@gmail.com:
 On Mon, Jul 18, 2011 at 4:21 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 3) sepgsql is currently a bit hard to find in the documentation.
 www.postgresql.org website search doesn't find sepgsql and selinux only
 refers to an old PostgreSQL redhat bug in 2005 on bugzilla.redhat.com. I had
 to manually remember it was a contrib module. Also sepgsql isn't linked to
 at the SECURITY LABEL page. At the moment I'm unsure if I have seen all
 sepgsql related sgml-based documentation.

 Improvement of documentation is an issue.
 The wiki.postgresql.org should be an appropriate place, maybe.

 The reason why SECURITY LABEL does not point to sepgsql.sgml is
 that it is general purpose infrastructure for all the upcoming label based
 security mechanism, not only sepgsql.
 It was our consensus in v9.1 development.

 Actually, I think it's that way mostly because we committed the
 SECURITY LABEL stuff first.  I'd be in favor of adding some kind of
 cross-link.

Hmm. OK, I'll submit a patch to update this documentation stuff tomorrow,
including an explicit -D option as Yeb mentioned.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-07-18 Thread Josh Berkus

Tom,

 Either one.  They both have the potential to reference more than one
 column, so if the committee had meant errors to try to identify the
 referenced columns, they'd have put something other than COLUMN_NAME
 into the standard.  They didn't.

I'm less concerned about the standard here and more concerned about what
helps our users.  Having column names for an FK error is *extremely*
useful for troubleshooting, particularly if the database has been
upgraded from the 7.4 days and has non-useful FK names like $3.

I agree that column names for CHECK constraints is a bit of a tar baby,
since check constraints can be on complex permutations of columns.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest Status: Sudden Death Overtime

2011-07-18 Thread Josh Berkus

Simon,

 * Cascade Replication
 
 Sorry, too busy reviewing to take note of this. I expect to commit
 when its tested and ready.

So, would that be this commitfest, or next commitfest?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest Status: Sudden Death Overtime

2011-07-18 Thread Josh Berkus

Robert,

 * Collect frequency statistics and selectivity estimation for arrays
 * Allow multiple Postgres clusters running on the same machine to
 distinguish themselves in the event log
 * lazy vxid locks
 
 I'm not clear on your criteria for moving these patches, as they seem
 to be in somewhat different states.

For the array criteria, it took me until the last week of the CF to find
a reviewer. That reviewer found a number of issues, and the submitter
submitted an updated patch.  It looks quite likely that work on that
patch will get updated in the next couple weeks but not immediately.

For the eventlog, MauMau had submitted an updated patch, but all of our
Windows hackers had let me know that they were unavailable for the next
couple weeks for review or commit.

For lazy VXID locks, I'd the impression that we had an ongoing
discussion about whether we were going to apply it or not, and you were
on vacation for the last week of the CF.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-07-18 Thread Kevin Grittner

Josh Berkus j...@agliodbs.com wrote:
 
 I'm less concerned about the standard here and more concerned
 about what helps our users.  Having column names for an FK error
 is *extremely* useful for troubleshooting, particularly if the
 database has been upgraded from the 7.4 days and has non-useful FK
 names like $3.
 
If it gives a FK constraint name, isn't there a way to get from that
to the columns used by the constraint?  If we want to support
something non-standard, we can always tell them to look at the text
of the error detail, right?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest Status: Sudden Death Overtime

2011-07-18 Thread Robert Haas

On Mon, Jul 18, 2011 at 4:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 and an
 error-reporting patch that Tom weighed in on over the weekend.  This
 last suffers from the issue that it's not quite clear whether Tom is
 going to do the work or whether he's expecting the submitter to do it.

 If you mean the business about allowing GUCs in postgresql.conf to be
 applied even if there are semantic errors elsewhere, I'm just as happy
 to let Alexey or Florian have a go at it first, if they want.  The real
 question at the moment is do we have consensus about changing that?
 Because if we do, the submitted patch is certainly not something to
 commit as-is, and should be marked Returned With Feedback.

I'm not totally convinced.  The proposed patch is pretty small, and
seems to stand on its own two feet.  I don't hear anyone objecting to
your proposed plan, but OTOH it doesn't strike me as such a good plan
that we should reject all other improvements in the meantime.  Maybe
I'm missing something...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Initial Review: JSON contrib modul was: Re: [HACKERS] Another swing at JSON

2011-07-18 Thread Joey Adams

On Mon, Jul 18, 2011 at 3:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, could the \u problem be finessed by leaving such escapes in
 source form?

Yes, it could.  However, it doesn't solve the problem of comparison
(needed for member lookup), which requires canonicalizing the strings
to be compared.

Here's a Unicode handling policy I came up with.  It is guaranteed to
be lossless as long as the client and database encodings are the same.

---

On input (json_in), if the text is valid JSON, it is condensed:

 * Whitespace characters surrounding tokens are removed.
 * Long escapes (like \u0022) are converted to short escapes (like \)
where possible.
 * Unnecessary escapes of ASCII characters (e.g. \u0061 and even
\u007F) are converted to their respective characters.
 * Escapes of non-ASCII characters (e.g. \u0080, \u266B, \uD834\uDD1E)
are converted to their respective characters, but only if the database
encoding is UTF-8.

On output (json_out), non-ASCII characters are converted to \u
escapes, unless one or more of these very likely circumstances hold:

 * The client encoding and database encoding are the same.  No
conversion is performed, so escaping characters will not prevent any
conversion errors.
 * The client encoding is UTF-8.  Escaping is not necessary because
the client can encode all Unicode codepoints.
 * The client encoding and/or database encoding is SQL_ASCII.
SQL_ASCII tells PostgreSQL to shirk transcoding in favor of speed.

When a JSON-encoded string is unwrapped using from_json (e.g.
from_json($$ \u00A1Hola! $$)), escapes will be converted to the
characters they represent.  If any escapes cannot be represented in
the database encoding, an error will be raised.  Note that:

 * If the database encoding is UTF-8, conversion will never fail.
 * If the database encoding is SQL_ASCII, conversion will fail if any
escapes of non-ASCII characters are present.

---

However, I'm having a really hard time figuring out how comparison
would work in this framework.  Here are a few options:

 1. Convert the strings to UTF-8, convert the escapes to characters,
and compare the strings.
 2. Convert the escapes to the database encoding, then compare the strings.
 3. If either string contains escapes of non-ASCII characters, do 1.
Otherwise, do 2.

Number 1 seems the most sane to me, but could lead to rare errors.

Number 3 could produce confusing results.  If character set X has
three different representations of one Unicode codepoint, then we
could have scenarios like this (simplified):

 abc♫ != aaa♫

but:

 abc\u266b == aaa♫

I suppose a simple solution would be to convert all escapes and
outright ban escapes of characters not in the database encoding.  This
would have the nice property that all strings can be unescaped
server-side.  Problem is, what if a browser or other program produces,
say, \u00A0 (NO-BREAK SPACE), and tries to insert it into a database
where the encoding lacks this character?

On the other hand, converting all JSON to UTF-8 would be simpler to
implement.  It would probably be more intuitive, too, given that the
JSON RFC says, JSON text SHALL be encoded in Unicode.

In any case, the documentation should say Use UTF-8 for best
results, as there seems to be no entirely satisfactory way to handle
JSON in other database encodings.

- Joey

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add temp_file_limit GUC parameter to constrain temporary file sp

2011-07-18 Thread Mark Kirkwood


On 19/07/11 02:36, Tom Lane wrote:

Bruce Momjianbr...@momjian.us  writes:

Tom Lane wrote:

Huh?  Seems like a waste of text to me.  What else would happen?

Well, if we exceed work_mem, we spill to temp disk.  If we exceed temp
disk, we error out.  Those different behaviors don't seem obvious to me.

[ shrug... ]  Feel free to change it.




No objections from me - can't see any harm in making it very clear what 
happens when the limit is exceeded :-)



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] storing TZ along timestamps

2011-07-18 Thread Jim Nasby

On Jul 18, 2011, at 12:29 AM, Robert Haas wrote:
 On Fri, Jul 8, 2011 at 11:13 AM, Stuart Bishop stu...@stuartbishop.net 
 wrote:
 On Mon, Jun 6, 2011 at 7:50 AM, Jim Nasby j...@nasby.net wrote:
 On Jun 4, 2011, at 3:56 AM, Greg Stark wrote:
 On Thu, Jun 2, 2011 at 8:58 PM, Jim Nasby j...@nasby.net wrote:
 
 I'm torn between whether the type should store the original time or the 
 original time converted to GMT.
 
 This is the wrong way to think about it. We *never* store time
 converted to GMT.  When we want to represent a point in time we
 represent it as seconds since the epoch.
 Right. Sorry, my bad.
 
 The question here is how to represent more complex concepts than
 simply points in time. I think the two concepts under discussion are
 a) a composite type representing a point in time and a timezone it
 should be interpreted in for operations and display and b) the
 original input provided which is a text string with the constraint
 that it's a valid input which can be interpreted as a point in time.
 
 My fear with A is that something could change that would make it impossible 
 to actually get back to the time that was originally entered. For example, 
 a new version of the timezone database could change something. Though, that 
 problem also exists for timestamptz today, so presumably if it was much of 
 an issue we'd have gotten complaints by now.
 
 The common problem is daylight savings time being declared or
 cancelled. This happens numerous times throughout the year, often with
 short notice.
 
 If you want to store '6pm July 3rd 2014 Pacific/Fiji', and want that
 to keep meaning 6pm Fiji time no matter what decisions the Fijian
 government makes over the next two years, you need to store the
 wallclock (local) time and the timezone. The wallclock time remains
 fixed, but the conversion to UTC may float.
 
 If you are storing an point in time that remains stable no matter
 future political decisions, you store UTC time and an offset. The
 conversion to wallclock time may float, and your 6pm Fiji time meeting
 might change to 5pm or 7pm depending on the policical edicts.
 
 This is a pretty good point.  You would want the first of these
 (probably) for the time a meeting is scheduled to start, and the
 second for the time of a meteor shower (or some other natural event
 that doesn't care what the politicians decide).
 
 I feel like the second of these is pretty well handled by our existing
 timestamptz data type.  Granted, you can't store the intended display
 time zone, but putting it in a separate column is not really a
 problem: at least, it has the right semantics.  So maybe the first is
 the one we should be aiming at.  If so, storing a counter and a time
 zone is the wrong approach: you need to record something like year,
 month, day, hour, minute, second, tzname.

Right; you need a timestamp and you need to know what timezone that timestamp 
was entered in. That means you can always convert that time to whatever 
timezone you'd like (like timestamptz), but you also know what time was 
originally entered, and what timezone it was entered in. Technically you can do 
that with a separate field, but that seems really ugly to me.

So what we're proposing is a new data type that stores a timestamp as well as 
the timezone that that time was originally entered in. We can't just store a 3 
letter timezone abbreviation, because the mapping from 3 letter TZs to actual 
TZs is not fixed (see the timezone_abbreviations GUC). I believe the best way 
to handle this is a system table that stores the name of every timezone that 
the database has ever loaded from the timezone data files, along with an OID. 
That means that the storage for this is just a timestamp and an OID.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] storing TZ along timestamps

2011-07-18 Thread Josh Berkus

Jim,

 Right; you need a timestamp and you need to know what timezone that timestamp 
 was entered in. That means you can always convert that time to whatever 
 timezone you'd like (like timestamptz), but you also know what time was 
 originally entered, and what timezone it was entered in. Technically you can 
 do that with a separate field, but that seems really ugly to me.

I disagree.  It's a good mapping of the actual data.

The timestamp and the timezone in which that timestamp was entered are
two separate pieces of data and *ought* to be in two separate fields.
For one thing, the question of what timezone was this entered in is an
application-specific question, since you have three different potential
timezones:

* the actual client timezone
* the actual server timezone
* the application timezone if the application has configurable timezones

In a builtin data type, which of those three would you pick?  Only the
application knows.

Additionally, if you have your timestamp-with-original-timezone data
type, then you're going to need to recode every single
timestamp-handling function and operator to handle the new type.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest Status: Sudden Death Overtime

2011-07-18 Thread Florian Pflug

On Jul18, 2011, at 22:19 , Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 We're down to three patches that fall into this category: two XML
 patches by Florian, which have been somewhat derailed by an argument
 about whether values of type xml should, in fact, be required to be
 valid xml (I think you know my vote on this one...);
 
 Yeah, it's not clear to me either which of those are still reasonable
 candidates for committing as-is.

There are actually three XML-related patches from me in the queue.
I'll try to give an overview of their states - as perceived by me,
of course. If people want to comment on this, I suggest to do that
that either on the existing threads from these patches or on new ones,
but not on this one - lest confusion ensues.

* Bugfix for XPATH() if text or attribute nodes are selected
  https://commitfest.postgresql.org/action/patch_view?id=580

Makes XPATH() return valid XML if text or attribute are selected.

I'm not aware of any issues with that one, other than Radoslaw's
unhappiness about the change of behaviour. Since the current behaviour
is very clearly broken (as in dump, reload, ka-Woom), I'm not prepared
to accept that as a show-stopper.

The question about whether values of type XML should or should
not be in fact valid XML is a red herring. That question has long
ago been settles by the XML type's input function, which has a pretty
clear and consistent idea about what constitutes a valid value of type
XML. The patch simply makes XPATH() abide by those rules, instead of
making up rules of it's own pretty nilly-willy.

* Bugfix for XPATH() if expression returns a scalar value
  https://commitfest.postgresql.org/action/patch_view?id=565

Makes XPATH() support XPath expressions which compute a scalar value,
not a node set (i.e. apply a top-level function such as name()).
Currently, we return an empty array in that case.

Peter Eisentraut initially seemed to prefer creating separate functions
for that (XPATH_STRING, ...). I argued against that, on the grounds that
that'd make it impossible to evaluate user-supplied XPath expression (since
you wouldn't know which function to use). I haven't heard back from him
after that argument. Radoslaw liked the patch, but wanted the quoting
removed - same theme as with the other patch.

* XML error handling improvement to fix XPATH bug
  https://commitfest.postgresql.org/action/patch_view?id=579

Like that first patch, it fixes a bug where XPATH() returns invalid
XML. The cause is completely different though. Here, libxml is (partially)
at fault - it's parsing methods always return a document instance if
a document is *structurally* valid, even if it contains semantic error
like invalid namespace references. But it isn't fully prepared to
actually handle these inconsistent documents - for example, when asked
to render a namespace URI as text, it unconditionally assumes it doesn't
have to escape it, because it may not contain special characters anway.
Which, if course, isn't necessarily true for structurally valid but
semantically invalid documents...
valid...

Fixing that wasn't possible without a major rewrite of the XML support's
error handling - one must use the modern version of libxml's error handling
infrastructure to actually be able to detect these semantic error reliably
and distinguish them from mere warnings.

Noah Misch has reviewed that patch pretty extensively (Thanks again, Noah!),
and I've resolved all his concerns regarding the implementation. I haven't
seen a single argument *against* applying this so far.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: a validator for configuration files

2011-07-18 Thread Josh Berkus

Tom, Florian,

 On the downside, the current behaviour prevents problems if someone changes
 two interrelated GUCs, but makes a mistake at one of them. For example,
 someone might drastically lower bgwriter_delay but might botch the matching
 adjustment of bgwriter_lru_maxpages.
 
 That's a fair point, but the current behavior only saves you if the
 botch is such that the new value is detectably invalid, as opposed to
 say just a factor of 100 off from what you meant.  Not sure that that's
 all that helpful.

Hmmm.  As someone who often deploys pg.conf changes as part of a
production code rollout, I actually like the atomic nature of updating
postgresql.conf -- that is, all your changes succeed, or they all fail.

If we add this feature, I'd want there to be an option which allows
getting the current all-or-none behavior.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest Status: Sudden Death Overtime

2011-07-18 Thread Florian Pflug

On Jul18, 2011, at 22:19 , Tom Lane wrote:
 and an
 error-reporting patch that Tom weighed in on over the weekend.  This
 last suffers from the issue that it's not quite clear whether Tom is
 going to do the work or whether he's expecting the submitter to do it.
 
 If you mean the business about allowing GUCs in postgresql.conf to be
 applied even if there are semantic errors elsewhere, I'm just as happy
 to let Alexey or Florian have a go at it first, if they want.  The real
 question at the moment is do we have consensus about changing that?
 Because if we do, the submitted patch is certainly not something to
 commit as-is, and should be marked Returned With Feedback.

Just to avoid false expectations here: I'd be happy to review further
versions of this patch, but I won't write them.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-07-18 Thread Tom Lane

Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Josh Berkus j...@agliodbs.com wrote:
 I'm less concerned about the standard here and more concerned
 about what helps our users.  Having column names for an FK error
 is *extremely* useful for troubleshooting, particularly if the
 database has been upgraded from the 7.4 days and has non-useful FK
 names like $3.
 
 If it gives a FK constraint name, isn't there a way to get from that
 to the columns used by the constraint?  If we want to support
 something non-standard, we can always tell them to look at the text
 of the error detail, right?

Yes.  This is entirely *not* about friendliness to human users; they're
going to read the existing primary/detail/hint fields, and probably
aren't even going to see these new error fields by default.  What the
new fields are meant for is allowing client programs to do something
useful without parsing the text of the human-oriented fields ... for
instance, identify which FK constraint got violated.  Somebody who's
intending to use this functionality would presumably take care to give
his constraints names that were helpful for his purposes.  Moreover,
if he's hoping to use that client code against more than one database,
what he's going to want is SQL-standard functionality, not more nor less.

As for the my constraints have names like $3 argument, maybe an ALTER
CONSTRAINT RENAME command would be the most helpful answer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Initial Review: JSON contrib modul was: Re: [HACKERS] Another swing at JSON

2011-07-18 Thread Florian Pflug

On Jul19, 2011, at 00:17 , Joey Adams wrote:
 I suppose a simple solution would be to convert all escapes and
 outright ban escapes of characters not in the database encoding.

+1. Making JSON work like TEXT when it comes to encoding issues
makes this all much simpler conceptually. It also avoids all kinds
of weird issues if you extract textual values from a JSON document
server-side.

If we really need more flexibility than that, we should look at
ways to allow different columns to have different encodings. Doing
that just for JSON seems wrongs - especially because doesn't really
reduce the complexity of the problem, as your examples shows. The
essential problem here is, AFAICS, that there's really no sane way to
compare strings in two different encodings, unless both encode a
subset of unicode only.

 This would have the nice property that all strings can be unescaped
 server-side.  Problem is, what if a browser or other program produces,
 say, \u00A0 (NO-BREAK SPACE), and tries to insert it into a database
 where the encoding lacks this character?

They'll get an error - just as if they had tried to store that same
character in a TEXT column.

 On the other hand, converting all JSON to UTF-8 would be simpler to
 implement.  It would probably be more intuitive, too, given that the
 JSON RFC says, JSON text SHALL be encoded in Unicode.

Yet, they only I reason I'm aware of for some people to not use UTF-8
as the server encoding is that it's pretty inefficient storage-wise for
some scripts (AFAIR some japanese scripts are an example, but I don't
remember the details). By making JSON store UTF-8 on-disk always, the
JSON type gets less appealing to those people.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] storing TZ along timestamps

2011-07-18 Thread Alvaro Herrera

Excerpts from Josh Berkus's message of lun jul 18 18:37:15 -0400 2011:

 The timestamp and the timezone in which that timestamp was entered are
 two separate pieces of data and *ought* to be in two separate fields.
 For one thing, the question of what timezone was this entered in is an
 application-specific question, since you have three different potential
 timezones:
 
 * the actual client timezone
 * the actual server timezone
 * the application timezone if the application has configurable timezones
 
 In a builtin data type, which of those three would you pick?  Only the
 application knows.

I think this whole discussion is built on the assumption that the client
timezone and the application timezone are one thing and the same; and
the server timezone is not relevant at all.  If the app TZ is not the
client TZ, then the app will need fixed.

 Additionally, if you have your timestamp-with-original-timezone data
 type, then you're going to need to recode every single
 timestamp-handling function and operator to handle the new type.

I have my doubts about that, and I hope not.  These details haven't been
discussed at all; I only started this thread to get community approval
on cataloguing the TZs.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: new contrib module plpgsql's embeded sql validator

2011-07-18 Thread David Fetter

On Mon, Jul 18, 2011 at 02:05:42PM -0400, Alvaro Herrera wrote:
 Excerpts from Jim Nasby's message of dom jul 17 16:31:45 -0400 2011:
 
  On a somewhat related note, I'd also really like to have the
  ability to parse things like .sql files externally, to do things
  like LINT checking.
 
 We talked about this during PGCon.  The idea that seemed to have
 consensues there was to export the parser similarly to how we build
 the ecpg parser, that is, a set of perl scripts which take our
 gram.y as input and modify it to emit something different.

That solves the non-customized part of the problem, which is worth
solving.  A complete parser for a given DB would need catalog access,
i.e. a live connection to that DB.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-07-18 Thread Alvaro Herrera

Excerpts from Pavel Stehule's message of lun jul 18 16:02:43 -0400 2011:
 2011/7/18 Tom Lane t...@sss.pgh.pa.us:

  which suggests that it might be meant *only* for use with
  INSUFFICIENT_PRIVILEGE errors that are thrown due to a column ACL.
  We can probably extend that to some other syntax errors, like unknown
  column or wrong datatype or what have you, but there is nothing here to
  suggest that we have to force the issue for errors that don't naturally
  relate to exactly one column.  And CHECK constraints don't.  Consider
  CHECK (f1  f2).
 
 ok, this is relative clean, but
 
 so for example, NULL or DOMAIN constraints doesn't affect a
 COLUMN_NAME? These constraints has no name.

I dunno about domains, but NOT NULL constraints definitely have names
according to the standard (and will have them in PG soon enough).

Hmm, domain constraints are CHECK or NOT NULL, and both of them have or
will have names.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: new contrib module plpgsql's embeded sql validator

2011-07-18 Thread Tatsuo Ishii

 We talked about this during PGCon.  The idea that seemed to have
 consensues there was to export the parser similarly to how we build
 the ecpg parser, that is, a set of perl scripts which take our
 gram.y as input and modify it to emit something different.
 
 That solves the non-customized part of the problem, which is worth
 solving.  

In addition to this, parsing SQL may need tree walker functions and
some support functions(e.g. memory management). These are source files
from pgpool-II's parser directory.

copyfuncs.c  kwlist.h nodes.c   pg_list.h  pool_string.h  value.c
gram.c   kwlookup.c   nodes.h   pg_wchar.h primnodes.hvalue.h
gram.h   list.c   outfuncs.cpool_memory.c  scan.c wchar.c
gramparse.h  makefuncs.c  parsenodes.h  pool_memory.h  scanner.h
keywords.c   makefuncs.h  parser.c  pool_parser.h  scansup.c
keywords.h   memnodes.h   parser.h  pool_string.c  scansup.h

 A complete parser for a given DB would need catalog access,
 i.e. a live connection to that DB.

Yes, pgpool-II does some of this. (for example, to know if particular
table is a tempory table, to expand * to real column lists and so on).

Just F.Y.I.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow \dd to show constraint comments

2011-07-18 Thread Josh Kupershmidt

On Sun, Jul 17, 2011 at 11:25 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Jul 2, 2011 at 8:37 PM, Josh Kupershmidt schmi...@gmail.com wrote:
 On Sun, Jun 19, 2011 at 11:59 AM, Josh Kupershmidt schmi...@gmail.com 
 wrote:
 It seems funny to have is_system = true unconditionally for any object
 type.  Why'd you do it that way?  Or maybe I should ask, why true
 rather than false?

Thanks for the review!

Well, I was hoping to go by the existing psql backslash commands'
notions about what qualifies as system and what doesn't; that worked
OK for commands which supported the 'S' modifier, but not all do. For
objects like tablespaces, where you must be a superuser to create one,
it seems like they should all be considered is_system = true, on the
vague premise that superuser = system. Other objects like roles are
admittedly murkier, and I didn't find any precedent inside describe.c
to help make such distinctions.

But this is probably all a moot point, see below.

 I had to tweak the pg_proc.h entry for pg_opfamily_is_visible to play
 nice with the recent transform function change to that file.

 It seems like we ought to have this function for symmetry regardless
 of what happens to the rest of this, so I extracted and committed this
 part.

Good idea.

 Issues still to be resolved:

 1.) For now, I'm just ignoring the issue of visibility checks; I
 didn't see a simple way to support these checks \dd was doing:

    processSQLNamePattern(pset.db, buf, pattern, true, false,
                          n.nspname, p.proname, NULL,
                          pg_catalog.pg_function_is_visible(p.oid));

 I'm a bit leery of adding an is_visible column into pg_comments, but
 I'm not sure I have a feasible workaround if we do want to keep this
 visibility check. Maybe a big CASE statement for the last argument of
 processSQLNamePattern() would work...

 Yeah... or we could add a function pg_object_is_visible(classoid,
 objectoid) that would dispatch to the relevant visibility testing
 function based on object type.  Not sure if that's too much of a
 kludge, but it wouldn't be useful only here: you could probably use it
 in combination with pg_locks, for example.

Something like that might work, though an easy escape is just leaving
the query style of \dd as it was originally (i.e. querying the various
catalogs directly, and not using pg_comments): discussed a bit in the
recent pg_comments thread w/ Josh Berkus.

 The real problem with the is_system column is that it seems to be
 entirely targeted around what psql happens to feel like doing.  I'm
 pretty sure we'll regret it if we choose to go that route.

Yeah, it did turn out more messy than I had hoped, and I'm not sure
it'd be possible to iron out the semantics of is_system in a way that
leaves everyone happy. Probably best to just rip it out if \dd won't
need it.

I'll try to send an updated patch by this weekend.

Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow \dd to show constraint comments

2011-07-18 Thread Robert Haas

On Mon, Jul 18, 2011 at 10:57 PM, Josh Kupershmidt schmi...@gmail.com wrote:
 Well, I was hoping to go by the existing psql backslash commands'
 notions about what qualifies as system and what doesn't; that worked
 OK for commands which supported the 'S' modifier, but not all do. For
 objects like tablespaces, where you must be a superuser to create one,
 it seems like they should all be considered is_system = true, on the
 vague premise that superuser = system. Other objects like roles are
 admittedly murkier, and I didn't find any precedent inside describe.c
 to help make such distinctions.

I think that the idea is that system objects are the ones that the
user probably doesn't care to see most of the time - i.e. the ones
that are built in.  But...

 But this is probably all a moot point, see below.

...yes.

 Issues still to be resolved:

 1.) For now, I'm just ignoring the issue of visibility checks; I
 didn't see a simple way to support these checks \dd was doing:

    processSQLNamePattern(pset.db, buf, pattern, true, false,
                          n.nspname, p.proname, NULL,
                          pg_catalog.pg_function_is_visible(p.oid));

 I'm a bit leery of adding an is_visible column into pg_comments, but
 I'm not sure I have a feasible workaround if we do want to keep this
 visibility check. Maybe a big CASE statement for the last argument of
 processSQLNamePattern() would work...

 Yeah... or we could add a function pg_object_is_visible(classoid,
 objectoid) that would dispatch to the relevant visibility testing
 function based on object type.  Not sure if that's too much of a
 kludge, but it wouldn't be useful only here: you could probably use it
 in combination with pg_locks, for example.

 Something like that might work, though an easy escape is just leaving
 the query style of \dd as it was originally (i.e. querying the various
 catalogs directly, and not using pg_comments): discussed a bit in the
 recent pg_comments thread w/ Josh Berkus.

That's another approach.  It seems a bit lame, but...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade and log file output on Windows

2011-07-18 Thread Bruce Momjian

Has anyone successfully used pg_upgrade 9.0 with -l (log) on Windows?

I received a private email bug report that pg_upgrade 9.0 does not work
with the -l/log option on Windows.  The error is:

Analyzing all rows in the new cluster
c:/MinGW/msys/1.0/home/edb/inst/bin/vacuumdb --port 55445 --username 
edb --all --analyze
 c:/MinGW/msys/1.0/home/edb/auxschedule/test.log 21
The process cannot access the file because it is being used by another
process.

What has me confused is this same code exists in pg_migrator, which was
fixed to work with -l on Windows by Hiroshi Saito with this change:

/*
 * On Win32, we can't send both server output and pg_ctl output
 * to the same file because we get the error:
 * The process cannot access the file because it is being used by 
another process.
 * so we have to send pg_ctl output to 'nul'.
 */
sprintf(cmd, SYSTEMQUOTE \%s/pg_ctl\ -l \%s\ -D \%s\ 
-o \-p %d -c autovacuum=off -c 
autovacuum_freeze_max_age=20\ 
start  \%s\ 21 SYSTEMQUOTE,
 bindir, ctx-logfile, datadir, port,
#ifndef WIN32
 ctx-logfile);
#else
 DEVNULL);
#endif

The fix was not to use the same log file and output file for pg_ctl. 
But as you can see, the pg_ctl and vacuumdb code is unchanged:

prep_status(ctx, Analyzing all rows in the new cluster);
exec_prog(ctx, true,
  SYSTEMQUOTE \%s/vacuumdb\ --port %d --username \%s\ 
  --all --analyze  %s 21 SYSTEMQUOTE,
  ctx-new.bindir, ctx-new.port, ctx-user, ctx-logfile);

I can't figure out of there is something odd about this user's setup or
if there is a bug in pg_upgrade with -l on Windows.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoid index rebuilds for no-rewrite ALTER TABLE ALTER TYPE

2011-07-18 Thread Peter Eisentraut

On mån, 2011-07-18 at 11:05 -0400, Robert Haas wrote:
 On Thu, Jul 7, 2011 at 3:25 PM, Robert Haas robertmh...@gmail.com wrote:
  On Thu, Jul 7, 2011 at 3:21 PM, Noah Misch n...@2ndquadrant.com wrote:
  On Thu, Jul 07, 2011 at 03:06:46PM -0400, Robert Haas wrote:
  On Thu, Jul 7, 2011 at 2:55 PM, Noah Misch n...@2ndquadrant.com wrote:
   CheckIndexCompatible() calls ComputeIndexAttrs() to resolve the new 
   operator
   classes, collations and exclusion operators for each index column.  It 
   then
   checks those against the existing values for the same.  I figured that 
   was
   obvious enough, but do you want a new version noting that?
 
  I guess one question I had was... are we depending on the fact that
  ComputeIndexAttrs() performs a bunch of internal sanity checks?  Or
  are we just expecting those to always pass, and we're going to examine
  the outputs after the fact?
 
  Those checks can fail; consider an explicit operator class or collation 
  that
  does not support the destination type.  At that stage, we neither rely on 
  those
  checks nor mind if they do fire.  If we somehow miss the problem at that 
  stage,
  DefineIndex() will detect it later.  Likewise, if we hit an error in
  CheckIndexCompatible(), we would also hit it later in DefineIndex().
 
  OK.
 
 Committed with minor comment and documentation changes.

Please review and fix this compiler warning:

indexcmds.c: In function ‘CheckIndexCompatible’:
indexcmds.c:126:15: warning: variable ‘amoptions’ set but not used 
[-Wunused-but-set-variable]



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest Status: Sudden Death Overtime

2011-07-18 Thread Jeff Davis

On Mon, 2011-07-18 at 15:59 -0400, Robert Haas wrote:
 On a pgbench run with 8
 clients on a 32-core machine, I see about a 2% speedup from that patch
 on pgbench -S, and it grows to 8% at 32 clients.  At 80 clients (but
 still just a 32-core box), the results bounce around more, but taking
 the median of three five-minute runs, it's an 11% improvement.  To me,
 that's enough to make it worth applying, especially considering that
 what is 11% on today's master is, in raw TPS, equivalent to maybe 30%
 of yesterday's master (prior to the fast relation lock patch being
 applied).  More, it seems pretty clear that this is the conceptually
 right thing to do, even if it's going to require some work elsewhere
 to file down all the rough edges thus exposed.  If someone objects to
 that, then OK, we should talk about that: but so far I don't think
 anyone has expressed strong opposition: in which case I'd like to fix
 it up and get it in.

Agreed. I certainly like the concept of the lazy vxid patch.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and log file output on Windows

2011-07-18 Thread Andrew Dunstan

On Mon, July 18, 2011 11:28 pm, Bruce Momjian wrote:
 Has anyone successfully used pg_upgrade 9.0 with -l (log) on Windows?

 I received a private email bug report that pg_upgrade 9.0 does not work
 with the -l/log option on Windows.  The error is:

   Analyzing all rows in the new cluster
   c:/MinGW/msys/1.0/home/edb/inst/bin/vacuumdb --port 55445 --username
 edb --all --analyze
c:/MinGW/msys/1.0/home/edb/auxschedule/test.log 21
   The process cannot access the file because it is being used by another
   process.

 What has me confused is this same code exists in pg_migrator, which was
 fixed to work with -l on Windows by Hiroshi Saito with this change:

   /*
* On Win32, we can't send both server output and pg_ctl output
* to the same file because we get the error:
* The process cannot access the file because it is being used by
 another process.
* so we have to send pg_ctl output to 'nul'.
*/
   sprintf(cmd, SYSTEMQUOTE \%s/pg_ctl\ -l \%s\ -D \%s\ 
   -o \-p %d -c autovacuum=off -c
 autovacuum_freeze_max_age=20\ 
   start  \%s\ 21 SYSTEMQUOTE,
bindir, ctx-logfile, datadir, port,
   #ifndef WIN32
ctx-logfile);
   #else
DEVNULL);
   #endif

 The fix was not to use the same log file and output file for pg_ctl.
 But as you can see, the pg_ctl and vacuumdb code is unchanged:

 prep_status(ctx, Analyzing all rows in the new cluster);
 exec_prog(ctx, true,
   SYSTEMQUOTE \%s/vacuumdb\ --port %d --username \%s\ 
   --all --analyze  %s 21 SYSTEMQUOTE,
   ctx-new.bindir, ctx-new.port, ctx-user, ctx-logfile);

 I can't figure out of there is something odd about this user's setup or
 if there is a bug in pg_upgrade with -l on Windows.



The Windows file system seems to have some asynchronicity regarding what
files are locked. For that reason, the buildfarm code has long had a
couple of sleep(5) calls where it calls pg_ctl. You might benefit from
doing something similar.

cheers

andrew.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: a validator for configuration files

2011-07-18 Thread Peter Eisentraut

On sön, 2011-07-17 at 00:59 -0400, Tom Lane wrote:
 Well, we *do* have a C API for that, of a sort.  The problem is, what
 do you do in processes that have not loaded the relevant extension?

Those processes that have the extension loaded check the parameter
settings in their namespace, those that don't ignore them.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

73 matches

Mail list logo