Re: [HACKERS] Poorly thought out code in vacuum

2012-01-06 Thread Simon Riggs
On Fri, Jan 6, 2012 at 12:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 We could fix the direct symptom by inserting UnlockReleaseBuffer()
 in front of the continue, but AFAICS that just makes this into a
 busy-waiting equivalent of waiting unconditionally, so I don't really
 see why we should not revert the above fragment altogether.  However,
 I suppose Robert had something more intelligent in mind than a tight
 loop when the buffer can't be exclusively locked, so maybe there is
 some other change that should be made here instead.

Skipping the block completely isn't feasible. The only options are to
wait or to do out of order cleaning.

The conditional behaviour in vacuum_scan_heap() is much more important
that it is here, so just waiting for the lock wouldn't be too bad. Out
of order cleaning could be very expensive in terms of I/O and could
make that less robust. So I'd say lets just wait normally for the
lock.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-06 Thread Simon Riggs
On Thu, Jan 5, 2012 at 10:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 5, 2012 at 2:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I would be in favor of that, or perhaps some other formula (eg, maybe
 the minimum should be less than 8 for when you've got very little shmem).

 I have some results that show that, under the right set of
 circumstances, 8-32 is a win, and I can quantify by how much it wins.
  I don't have any data at all to quantify the cost of dropping the
 minimum from 8-6, or from 8-4, and therefore I'm reluctant to do it.
  My guess is that it's a bad idea, anyway.  Even on a system where
 shared_buffers is just 8MB, we have 1024 regular buffers and 8 CLOG
 buffers.  If we reduce the number of CLOG buffers from 8 to 4 (i.e. by
 50%), we can increase the number of regular buffers from 1024 to 1028
 (i.e. by 0.5%).  Maybe you can find a case where that comes out to a
 win, but you might have to look pretty hard.

 I think you're rejecting the concept too easily.  A setup with very
 little shmem is only going to be suitable for low-velocity systems that
 are not pushing too many transactions through per second, so it's not
 likely to need so many CLOG buffers.  And frankly I'm not that concerned
 about what the performance is like: I'm more concerned about whether
 PG will start up at all without modifying the system shmem limits,
 on systems with legacy values for SHMMAX etc.  Shaving a few
 single-purpose buffers to make back what we spent on SSI, for example,
 seems like a good idea to me.

Having 32 clog buffers is important at the high end. I don't think
that other complexities should mask that truth and lead to us not
doing anything on this topic for this release.

Please can we either make it user configurable? prepared transactions
require config, lock table size is configurable also, so having SSI
and clog require config is not too much of a stretch. We can then
discuss intelligent autotuning behaviour when we have more time and
more evidence.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Simon Riggs
On Fri, Jan 6, 2012 at 1:10 AM, Stephen Frost sfr...@snowman.net wrote:
 * Simon Riggs (si...@2ndquadrant.com) wrote:
 I discover that non-all-zeroes holes are fairly common, just not very 
 frequent.

 Curious, might be interesting to find out why.

 That may or may not be a problem, but not something to be dealt with
 here and now.

 But I agree that it's not the job of this patch/effort.  It sounds like
 we have clear indication, however, that those areas, as they are not
 necessairly all zeros, should be included in the checksum.

Disagree. Full page writes ignore the hole, so its appropriate to do
so here also.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Andres Freund
On Friday, January 06, 2012 11:30:53 AM Simon Riggs wrote:
 On Fri, Jan 6, 2012 at 1:10 AM, Stephen Frost sfr...@snowman.net wrote:
  * Simon Riggs (si...@2ndquadrant.com) wrote:
  I discover that non-all-zeroes holes are fairly common, just not very
  frequent.
  
  Curious, might be interesting to find out why.
  
  That may or may not be a problem, but not something to be dealt with
  here and now.
  
  But I agree that it's not the job of this patch/effort.  It sounds like
  we have clear indication, however, that those areas, as they are not
  necessairly all zeros, should be included in the checksum.
 
 Disagree. Full page writes ignore the hole, so its appropriate to do
 so here also.
Well, ignoriging them in fpw has clear space benefits. Ignoring them while 
checksumming doesn't have that much of a benefit.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-06 Thread Magnus Hagander
On Thu, Jan 5, 2012 at 23:19, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Thu, Jan 5, 2012 at 17:13, Tom Lane t...@sss.pgh.pa.us wrote:
 I think link(2) would create race conditions of its own.  I'd be
 inclined to suggest that maybe we should just special-case a zero length
 postmaster.pid file as meaning okay to proceed.  In general, garbage

 That's pretty much what I meant - but with a warning message.

 Actually ... wait a minute.  If we allow that, don't we create a race
 condition between two postmasters starting at almost the same instant?
 The second one could see the lock file when the first has created but
 not yet filled it.

Good point, yeah, it should do that. But I still think it's rare
enough that just special-casing the error message should be enough...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Unlogged tables and BufferSync()

2012-01-06 Thread Heikki Linnakangas

In BufferSync(), we have this:


/*
 * Unless this is a shutdown checkpoint, we write only permanent, dirty
 * buffers.  But at shutdown time, we write all dirty buffers.
 */
if (!(flags  CHECKPOINT_IS_SHUTDOWN))
flags |= BM_PERMANENT;


That seems bogus to me. We're mixing CHECKPOINT_* flags and buffer BM_* 
flags in the same variable. Furthermore, we only use that flags variable 
to pass it down to CheckpointWriteDelay(), which only pays attention to 
CHECKPOINT_IMMEDIATE flag. The intention was probably to do mask |= 
BM_PERMANENT ?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unlogged tables and BufferSync()

2012-01-06 Thread Robert Haas
On Fri, Jan 6, 2012 at 7:52 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 In BufferSync(), we have this:

        /*
         * Unless this is a shutdown checkpoint, we write only permanent,
 dirty
         * buffers.  But at shutdown time, we write all dirty buffers.
         */
        if (!(flags  CHECKPOINT_IS_SHUTDOWN))
                flags |= BM_PERMANENT;


 That seems bogus to me. We're mixing CHECKPOINT_* flags and buffer BM_*
 flags in the same variable. Furthermore, we only use that flags variable to
 pass it down to CheckpointWriteDelay(), which only pays attention to
 CHECKPOINT_IMMEDIATE flag. The intention was probably to do mask |=
 BM_PERMANENT ?

Ouch.  Yeah, that's a bug, will fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-06 Thread Michael Beattie
On Fri, Jan 6, 2012 at 6:13 AM, Magnus Hagander mag...@hagander.net wrote:

 On Thu, Jan 5, 2012 at 23:19, Tom Lane t...@sss.pgh.pa.us wrote:
  Magnus Hagander mag...@hagander.net writes:
  On Thu, Jan 5, 2012 at 17:13, Tom Lane t...@sss.pgh.pa.us wrote:
  I think link(2) would create race conditions of its own.  I'd be
  inclined to suggest that maybe we should just special-case a zero
 length
  postmaster.pid file as meaning okay to proceed.  In general, garbage
 
  That's pretty much what I meant - but with a warning message.
 
  Actually ... wait a minute.  If we allow that, don't we create a race
  condition between two postmasters starting at almost the same instant?
  The second one could see the lock file when the first has created but
  not yet filled it.

 Good point, yeah, it should do that. But I still think it's rare
 enough that just special-casing the error message should be enough...


so just something that does like

stat(filename, st);
size = st.st_size;
if (size == 0)
   elog(ERROR, lock file \%s\ has 0 length., filename);

somewhere in CreateLockFile in miscinit.c?


Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.

2012-01-06 Thread Andrew Dunstan



On 01/05/2012 10:59 PM, Alex Hunsaker wrote:

After further digging I found it chokes on any non scalar (IOW any
reference). I attached a simple c program that I tested with 5.8.9,
5.10.1, 5.12.4 and 5.14.2 (for those who did not know about it,
perlbrew made testing across all those perls relatively painless).

PFA that copies if its readonly and its not a scalar.

I didn't bother adding regression tests-- should I have?


[redirecting to -hackers]


I have several questions.

1. How much are we actually saving here? newSVsv() ought to be pretty 
cheap, no? I imagine it's pretty heavily used inside the interpreter.


2. Unless I'm insufficiently caffeinated right now, there's something 
wrong with this logic:


!   if (SvREADONLY(sv)
!   (type != SVt_IV ||
!   type != SVt_NV ||
!   type != SVt_PV))

3. The above is in any case almost certainly insufficient, because in my tests 
a typeglob didn't trigger SvREADONLY(), but did cause a crash.


And yes, we should possibly add a regression test or two. Of course, we can't 
use the cause of the original complaint ($^V) in them, though.
 


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Poorly thought out code in vacuum

2012-01-06 Thread Robert Haas
On Thu, Jan 5, 2012 at 7:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I suppose Robert had something more intelligent in mind than a tight
 loop when the buffer can't be exclusively locked, so maybe there is
 some other change that should be made here instead.

My intention was to skip the tuple, but I failed to notice the unusual
way in which this loop iterates.  How about something like the
attached?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


fix-vacuum-loop.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Poorly thought out code in vacuum

2012-01-06 Thread Simon Riggs
On Fri, Jan 6, 2012 at 2:29 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 5, 2012 at 7:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I suppose Robert had something more intelligent in mind than a tight
 loop when the buffer can't be exclusively locked, so maybe there is
 some other change that should be made here instead.

 My intention was to skip the tuple, but I failed to notice the unusual
 way in which this loop iterates.  How about something like the
 attached?

It solves the waiting issue, but leaves unused tuples in the heap that
previously would have been removed.

I don't think that is a solution.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Poorly thought out code in vacuum

2012-01-06 Thread Robert Haas
On Fri, Jan 6, 2012 at 9:53 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 6, 2012 at 2:29 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 5, 2012 at 7:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I suppose Robert had something more intelligent in mind than a tight
 loop when the buffer can't be exclusively locked, so maybe there is
 some other change that should be made here instead.

 My intention was to skip the tuple, but I failed to notice the unusual
 way in which this loop iterates.  How about something like the
 attached?

 It solves the waiting issue, but leaves unused tuples in the heap that
 previously would have been removed.

 I don't think that is a solution.

Uh, we discussed this before the patch was committed, and you agreed
it made sense to do that in the second heap scan just as we do it in
the first heap scan.  If you now see a problem with that, that's fine,
but please explain what the problem is, rather than just saying it's
not acceptable for the patch to do the thing that the patch was
designed to do.

Actually, on further review, I do see another problem: we need to pass
the scan_all flag down to lazy_vacuum_heap, and only do this
conditionally if that flag is not set.  Otherwise we might skip a page
but still advance relfrozenxid, which would definitely be bad.  But
that looks easy to fix, and I don't see any other reason why this
would be either unsafe or undesirable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-06 Thread Robert Haas
On Thu, Jan 5, 2012 at 5:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 There is no compiler anywhere that implements always inline, unless
 you are talking about a macro.  inline is a hint and nothing more,
 and if you think you can force it you are mistaken.  So this controversy
 is easily resolved: we do not need any such construct.

That's basically in direct contradiction to the experimental evidence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.

2012-01-06 Thread Alvaro Herrera

Excerpts from Andrew Dunstan's message of vie ene 06 10:34:30 -0300 2012:

 And yes, we should possibly add a regression test or two. Of course, we can't 
 use the cause of the original complaint ($^V) in them, though.

Why not?  You obviously can't need output it verbatim, but you could
compare it with a different function that returns $^V stringified by a
different mechanism.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-06 Thread Robert Haas
On Thu, Jan 5, 2012 at 5:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 5, 2012 at 2:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I would be in favor of that, or perhaps some other formula (eg, maybe
 the minimum should be less than 8 for when you've got very little shmem).

 I have some results that show that, under the right set of
 circumstances, 8-32 is a win, and I can quantify by how much it wins.
  I don't have any data at all to quantify the cost of dropping the
 minimum from 8-6, or from 8-4, and therefore I'm reluctant to do it.
  My guess is that it's a bad idea, anyway.  Even on a system where
 shared_buffers is just 8MB, we have 1024 regular buffers and 8 CLOG
 buffers.  If we reduce the number of CLOG buffers from 8 to 4 (i.e. by
 50%), we can increase the number of regular buffers from 1024 to 1028
 (i.e. by 0.5%).  Maybe you can find a case where that comes out to a
 win, but you might have to look pretty hard.

 I think you're rejecting the concept too easily.  A setup with very
 little shmem is only going to be suitable for low-velocity systems that
 are not pushing too many transactions through per second, so it's not
 likely to need so many CLOG buffers.

Well, if you take the same workload and spread it out over a long
period of time, it will still have just as many CLOG misses or
shared_buffers misses as it had when you did it all at top speed.
Admittedly, you're unlikely to run into the situation where you have
people wanting to do simultaneous CLOG reads than there are buffers,
but you'll still thrash the cache.

 And frankly I'm not that concerned
 about what the performance is like: I'm more concerned about whether
 PG will start up at all without modifying the system shmem limits,
 on systems with legacy values for SHMMAX etc.

After thinking about this a bit, I think the problem is that the
divisor we picked is still too high.  Suppose we set num_clog_buffers
= (shared_buffers / 4MB), with a minimum of 4 and maximum of 32.  That
way, pretty much anyone who bothers to set shared_buffers to a
non-default value will get 32 CLOG buffers, which should be fine, but
people who are in the 32MB-or-less range can ramp down lower than what
we've allowed in the past.  That seems like it might give us the best
of both worlds.

 Shaving a few
 single-purpose buffers to make back what we spent on SSI, for example,
 seems like a good idea to me.

I think if we want to buy back that memory, the best way to do it
would be to add a GUC to disable SSI at startup time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-06 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 Please can we either make it user configurable?

Weren't you just complaining that *I* was overcomplicating things?
I see no evidence to justify inventing a user-visible GUC here.
We have rough consensus on both the need for and the shape of a formula,
with just minor discussion about the exact parameters to plug into it.
Punting the problem off to a GUC is not a better answer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 After thinking about this a bit, I think the problem is that the
 divisor we picked is still too high.  Suppose we set num_clog_buffers
 = (shared_buffers / 4MB), with a minimum of 4 and maximum of 32.

Works for me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Poorly thought out code in vacuum

2012-01-06 Thread Simon Riggs
On Fri, Jan 6, 2012 at 3:28 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 6, 2012 at 9:53 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 6, 2012 at 2:29 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 5, 2012 at 7:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I suppose Robert had something more intelligent in mind than a tight
 loop when the buffer can't be exclusively locked, so maybe there is
 some other change that should be made here instead.

 My intention was to skip the tuple, but I failed to notice the unusual
 way in which this loop iterates.  How about something like the
 attached?

 It solves the waiting issue, but leaves unused tuples in the heap that
 previously would have been removed.

 I don't think that is a solution.

 Uh, we discussed this before the patch was committed, and you agreed
 it made sense to do that in the second heap scan just as we do it in
 the first heap scan.  If you now see a problem with that, that's fine,
 but please explain what the problem is, rather than just saying it's
 not acceptable for the patch to do the thing that the patch was
 designed to do.

My name is on that patch, so of course I accept responsibility for the
earlier decision making.

I *have* explained what is wrong. It leaves unused tuples in the heap
that would previously have been removed.

More simply: lazy_vacuum_page() does some work and we can't skip that
work just because we don't get the lock. Elsewhere in the patch we
skipped getting the lock when there was no work to be done. In
lazy_vacuum_heap() we only visit pages that need further work, hence
every page is important.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.

2012-01-06 Thread Andrew Dunstan



On 01/06/2012 10:49 AM, Alvaro Herrera wrote:

Excerpts from Andrew Dunstan's message of vie ene 06 10:34:30 -0300 2012:


And yes, we should possibly add a regression test or two. Of course, we can't 
use the cause of the original complaint ($^V) in them, though.

Why not?  You obviously can't need output it verbatim, but you could
compare it with a different function that returns $^V stringified by a
different mechanism.



not sure exactly how to in such a way that exercises this code, but feel 
free to suggest something ;-)


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Poorly thought out code in vacuum

2012-01-06 Thread Robert Haas
On Fri, Jan 6, 2012 at 10:51 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I *have* explained what is wrong. It leaves unused tuples in the heap
 that would previously have been removed.

 More simply: lazy_vacuum_page() does some work and we can't skip that
 work just because we don't get the lock. Elsewhere in the patch we
 skipped getting the lock when there was no work to be done.

That is not true.  We skip vacuumable pages in the first heap pass
even if they contain dead tuples, unless scan_all is set or we can get
the buffer lock without waiting.

 In
 lazy_vacuum_heap() we only visit pages that need further work, hence
 every page is important.

If the system were to crash after the first heap pass and the index
vacuuming, but before the second heap pass, nothing bad would happen.
The next vacuum would collect those same dead tuples, scan the indexes
for them (and not find them, since we already removed them), and then
remove them from the heap.

We already made a decision that it's a good idea to skip vacuuming a
page on which we can't immediately obtain a cleanup lock, because
waiting for the cleanup lock can cause the autovacuum worker to get
stuck indefinitely, starving the table and in some cases the entire
cluster of adequate vacuuming.  That logic is just as valid in the
second heap pass as it is in the first one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-06 Thread Simon Riggs
On Fri, Jan 6, 2012 at 3:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 Please can we either make it user configurable?

 Weren't you just complaining that *I* was overcomplicating things?
 I see no evidence to justify inventing a user-visible GUC here.
 We have rough consensus on both the need for and the shape of a formula,
 with just minor discussion about the exact parameters to plug into it.
 Punting the problem off to a GUC is not a better answer.

As long as we get 32 buffers on big systems, I have no complaint.

I'm sorry if I moaned at you personally.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-06 Thread Peter Geoghegan
On 5 January 2012 20:23, Robert Haas robertmh...@gmail.com wrote:
 I don't have a problem with the idea of a pg_always_inline, but I'm
 wondering what sort of fallback mechanism you propose.  It seems to me
 that if the performance improvement is conditioned on inlining be done
 and we're not confident that we can force the compiler to inline, we
 ought to fall back to not making the specialization available, rather
 than to making something available that may not work well.  Of course
 if it's only a question of a smaller performance gain, rather than an
 actual loss, then that's not as much of an issue...

pg_always_inline is more a less a neat adjunct to what I've already
done. You can take it or leave it, but it would be irrational to
dismiss it out of hand, because it is demonstrably effective in this
case. Using non-portable things like function attributes is fairly
well precedented - we use __attribute__((format(*) )) frequently. We
don't need a fallback mechanism other than #else #define
pg_always_inline inline #endif. I think these facilities are
available to well over 99% of our users, who use GCC, GCC compatible
compiler and MSVC builds.

I have basically no sympathy for anyone who uses a compiler that can't
inline functions. Do such people actually exist?

 ISTM that protecting
 against that outside the qsort implementation (but only for index
 sorts) is wrong-headed.

 Assuming you mean that qsort_arg() will protect against this stuff so
 that callers don't need to do it separately, +1.

That's exactly what I mean. Going quadratic in the face of lot of
duplicates is something that, remarkably, was a problem well into the
1990s, apparently because some guy noticed it one day, though I think
that lots of popular implementations happened to be unaffected anyway.

As you know, all queries tested have lots and lots of duplicates (a
~1.5GB table that contains the same number of distinct elements as a
~10MB table once did), and removing the duplicate protection for
btrees, on top of everything else, sees the qsort do an awful lot
better than HEAD does with the psuedo-protection. As I've said, we
already lack any such protection for heap tuples. We are unaffected
now anyway, in particular, because we do this, as apparently
recommended by both Sedgewick and Knuth:

while (pb = pc  (r = cmp(pb, a)) = 0)
{
if (r == 0)
{
swap(pa, pb);
pa += es;
}
pb += es;
}

Note that the partition swap occurs *because* the pivot and element
are equal. You might imagine that this is superfluous. It turns out
that it protects against the duplicates, resulting in sub-partitions
about the same size (though it occurs to me that it does so at the
expense of precluding parallelism). In short, Sedgewick would approve
of our qsort implementation.

As for the compare a tuple to itself thing, that's just silly, any
was probably only ever seen in some homespun implementation at some
point a relatively long time ago, but I've asserted against it anyway.

 I can't find any description of how this actually works... obviously,
 we'd like to find a close-to-median element, but how do we actually do
 that cheaply?

Uh, it works whatever way you want it to work. The implementation has
to figure out how to get the median ahead of time.

 I doubt that this would be worthwhile -- the pivot that we pick at the
 toplevel doesn't really matter much; even if it's bad, we can recover
 just fine if the pivot we pick one level down is good, or the level
 after that.  We only really have a problem if we keep systematically
 picking bad pivots every time.  And if we do have that problem,
 improving the toplevel choice of pivot is only going to help modestly,
 because we'll still be O(n^2) on each partition.

 Can I get a straw poll on how much of a problem worst-case performance
 of qsort is believed to be?

 I'd be reluctant to remove any of the protections we currently have,
 because I bet they were all put in to fix problems that people hit in
 the field.  But I don't know that we need more.  Of course,
 consolidating them into qsort() itself rather than its callers
 probably makes sense, as noted above.

I was merely proposing something that would compliment the med3 method
and our existing protections.

 In a perfect world, if it were somehow possible to know the perfect
 pivots ahead of time from a histogram or something, we'd have a quick
 sort variant with worst-case performance of O(n log(n)). That, or the
 limited approximation that I've sketched would perhaps be worthwhile,
 even if it was subject to a number of caveats. Wikipedia claims that
 the worst case for quick sort is O(n log(n)) with the refinements
 recommended by Sedgewick's 1978 paper, but this seems like nonsense to
 me - the med3 technique is a good heuristic in 

Re: [HACKERS] Poorly thought out code in vacuum

2012-01-06 Thread Tom Lane
I started to wonder how likely it would be that some other process would
sit on a buffer pin for so long as to allow 134217727 iterations of
ReadBufferExtended, even given the slowdowns associated with
CLOBBER_CACHE_ALWAYS.  This led to some fruitless searching for possible
deadlock conditions, but eventually I realized that there's a much
simpler explanation: if PrivateRefCount  1 then
ConditionalLockBufferForCleanup always fails.  This means that once
ConditionalLockBufferForCleanup has failed once, the currently committed
code in lazy_vacuum_heap is guaranteed to loop until it gets a failure
in enlarging the ResourceOwner buffer-reference array.  Which in turn
means that neither of you ever actually exercised the skip case, or
you'd have noticed the problem.  Nor are the current regression tests
well designed to exercise the case.  (There might well be failures of
this type happening from time to time in autovacuum, but we'd not see
any reported failure in the buildfarm, unless we went trawling in
postmaster logs.)

So at this point I've got serious doubts as to the quality of testing of
that whole patch, not just this part.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-06 Thread Robert Haas
On Fri, Jan 6, 2012 at 12:10 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 As you know, all queries tested have lots and lots of duplicates (a
 ~1.5GB table that contains the same number of distinct elements as a
 ~10MB table once did), and removing the duplicate protection for
 btrees, on top of everything else, sees the qsort do an awful lot
 better than HEAD does with the psuedo-protection.

Does that also win vs. unpatched master?  If so we may as well pull
that part out and commit it separately.

 I can't find any description of how this actually works... obviously,
 we'd like to find a close-to-median element, but how do we actually do
 that cheaply?

 Uh, it works whatever way you want it to work. The implementation has
 to figure out how to get the median ahead of time.

Oh.  I'd be inclined to think that would be pretty inefficient, unless
we only did it for very large partitions or when we observed that some
less costly strategy was tanking.

 I gather from a quick read that this is supposed to be especially good
 when the data is already somewhat sorted.  Would there be any merit in
 trying to guess when that's likely to be the case (e.g. based on the
 correlation statistic)?   That seems like a stretch, but OTOH a GUC
 doesn't feel great either: what is the poor user to do with a query
 that does 2 sorts, one of which is faster with QS and the other of
 which is faster with Timsort?  Ugh.

 I'd imagined that it might work a bit like default_statistics_target.
 Of course, I was just thinking out loud. Also, we do a very good job
 on *perfectly* pre-sorted input because we check for pre-sorted input.

We occasionally have people who want to do SELECT * FROM foo WHERE a 
100 AND a  200 ORDER BY a, b where foo has an index on (a) but not
(a, b).  This tends to suck, because we fail to realize that we can
batch the sort.  We either seqscan and filter the table then sort the
results, or else we scan the index on (a) and then ignore the fact
that we have data which is already partially sorted.  It's
particularly annoying if a is mostly unique and needs b only
occasionally to break ties.  It would be nice to improve this, but it
may be more of a planner/executor problem that something we can fix
directly inside the sort implementation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Poorly thought out code in vacuum

2012-01-06 Thread Robert Haas
On Fri, Jan 6, 2012 at 12:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I started to wonder how likely it would be that some other process would
 sit on a buffer pin for so long as to allow 134217727 iterations of
 ReadBufferExtended, even given the slowdowns associated with
 CLOBBER_CACHE_ALWAYS.  This led to some fruitless searching for possible
 deadlock conditions, but eventually I realized that there's a much
 simpler explanation: if PrivateRefCount  1 then
 ConditionalLockBufferForCleanup always fails.  This means that once
 ConditionalLockBufferForCleanup has failed once, the currently committed
 code in lazy_vacuum_heap is guaranteed to loop until it gets a failure
 in enlarging the ResourceOwner buffer-reference array.  Which in turn
 means that neither of you ever actually exercised the skip case, or
 you'd have noticed the problem.  Nor are the current regression tests
 well designed to exercise the case.  (There might well be failures of
 this type happening from time to time in autovacuum, but we'd not see
 any reported failure in the buildfarm, unless we went trawling in
 postmaster logs.)

 So at this point I've got serious doubts as to the quality of testing of
 that whole patch, not just this part.

I tested the case where we skip a block during the first pass, but I
admit that I punted on testing the case where we skip a block during
the second pass, because I couldn't think of a good way to exercise
it.  Any suggestions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgsphere

2012-01-06 Thread Dave Cramer
I've been asked by someone to support pgshpere.

It would appear that the two project owners are MIA. If anyone knows
different can they let me know ?

Does anyone have any objection to me taking over the project?


Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Poorly thought out code in vacuum

2012-01-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Jan 6, 2012 at 12:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So at this point I've got serious doubts as to the quality of testing of
 that whole patch, not just this part.

 I tested the case where we skip a block during the first pass, but I
 admit that I punted on testing the case where we skip a block during
 the second pass, because I couldn't think of a good way to exercise
 it.  Any suggestions?

Hack ConditionalLockBufferForCleanup to have a 50% probability of
failure regardless of anything else, for instance via

static int ctr = 0;

if ((++ctr) % 2)
return false;

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsphere

2012-01-06 Thread Andrew Dunstan



On 01/06/2012 12:32 PM, Dave Cramer wrote:

I've been asked by someone to support pgshpere.

It would appear that the two project owners are MIA. If anyone knows
different can they let me know ?

Does anyone have any objection to me taking over the project?



One of the owners is Teodor, who is a core committer ... I hope he's not 
MIA.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Poorly thought out code in vacuum

2012-01-06 Thread Robert Haas
On Fri, Jan 6, 2012 at 12:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Jan 6, 2012 at 12:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So at this point I've got serious doubts as to the quality of testing of
 that whole patch, not just this part.

 I tested the case where we skip a block during the first pass, but I
 admit that I punted on testing the case where we skip a block during
 the second pass, because I couldn't think of a good way to exercise
 it.  Any suggestions?

 Hack ConditionalLockBufferForCleanup to have a 50% probability of
 failure regardless of anything else, for instance via

        static int ctr = 0;

        if ((++ctr) % 2)
                return false;

Oh, that's brilliant.  OK, I'll go try that.

Note to self: Try to remember to take that hack back out before committing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Simon Riggs
On Thu, Jan 5, 2012 at 3:46 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Jan 4, 2012 at 1:35 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:

 Sure.  I just think you are there already except for what I got into.

 New version attached, with your suggested changes included. Hole check
 code is there as well, but ifdef'd out since it isn't a valid check in
 all cases.

Following private discussions, Kevin showed me the patch at v3 was
inadequate because it didn't cater for torn pages as a result of hint
bit writes.

The following patch (v4) introduces a new WAL record type that writes
backup blocks for the first hint on a block in any checkpoint that has
not previously been changed. IMHO this fixes the torn page problem
correctly, though at some additional loss of performance but not the
total catastrophe some people had imagined. Specifically we don't need
to log anywhere near 100% of hint bit settings, much more like 20-30%
(estimated not measured).

Dan Scales also observed some cases where private writes aren't
checksummed, e.g. index build. Those suggestions are still outstanding
from me, but the v4 is worthy of more serious review.

I'll now turn my attention more fully onto the DW patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0cc3296..9b367a3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1701,6 +1701,48 @@ SET ENABLE_SEQSCAN TO OFF;
   /listitem
  /varlistentry
 
+ varlistentry id=guc-page-checksums xreflabel=page_checksums
+  indexterm
+   primaryvarnamepage_checksums/ configuration parameter/primary
+  /indexterm
+  termvarnamepage_checksums/varname (typeboolean/type)/term
+  listitem
+   para
+When this parameter is on, the productnamePostgreSQL/ server
+calculates checksums when it writes main database pages to disk,
+flagging the page as checksum protected.  When this parameter is off,
+no checksum is written, only a standard watermark in the page header.
+The database may thus contain a mix of pages with checksums and pages
+without checksums.
+   /para
+
+   para
+When pages are read into shared buffers any page flagged with a
+checksum has the checksum re-calculated and compared against the
+stored value to provide greatly improved validation of page contents.
+   /para
+
+   para
+Writes via temp_buffers are not checksummed.
+   /para
+
+   para
+Turning this parameter off speeds normal operation, but
+might allow data corruption to go unnoticed. The checksum uses
+16-bit checksums, using the fast Fletcher 16 algorithm. With this
+parameter enabled there is still a non-zero probability that an error
+could go undetected, as well as a non-zero probability of false
+positives.
+   /para
+
+   para
+This parameter can only be set in the filenamepostgresql.conf/
+file or on the server command line.
+The default is literaloff/.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry id=guc-wal-buffers xreflabel=wal_buffers
   termvarnamewal_buffers/varname (typeinteger/type)/term
   indexterm
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8e65962..c9538d3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -708,6 +708,7 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata)
 	bool		updrqst;
 	bool		doPageWrites;
 	bool		isLogSwitch = (rmid == RM_XLOG_ID  info == XLOG_SWITCH);
+	bool		IsHint = (rmid == RM_SMGR_ID  info == XLOG_SMGR_HINT);
 
 	/* cross-check on whether we should be here or not */
 	if (!XLogInsertAllowed())
@@ -975,6 +976,18 @@ begin:;
 	}
 
 	/*
+	 * If this is a hint record and we don't need a backup block then
+	 * we have no more work to do and can exit quickly without inserting
+	 * a WAL record at all. In that case return InvalidXLogRecPtr.
+	 */
+	if (IsHint  !(info  XLR_BKP_BLOCK_MASK))
+	{
+		LWLockRelease(WALInsertLock);
+		END_CRIT_SECTION();
+		return InvalidXLogRecPtr;
+	}
+
+	/*
 	 * If there isn't enough space on the current XLOG page for a record
 	 * header, advance to the next page (leaving the unused space as zeroes).
 	 */
@@ -3670,6 +3683,13 @@ RestoreBkpBlocks(XLogRecPtr lsn, XLogRecord *record, bool cleanup)
    BLCKSZ - (bkpb.hole_offset + bkpb.hole_length));
 		}
 
+		/*
+		 * Any checksum set on this page will be invalid. We don't need
+		 * to reset it here since it will be reset before being written
+		 * but it seems worth doing this for general sanity and hygiene.
+		 */
+		PageSetPageSizeAndVersion(page, BLCKSZ, PG_PAGE_LAYOUT_VERSION);
+
 		PageSetLSN(page, lsn);
 		PageSetTLI(page, ThisTimeLineID);
 		MarkBufferDirty(buffer);

Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-06 Thread Peter Geoghegan
On 6 January 2012 17:29, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 6, 2012 at 12:10 PM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 As you know, all queries tested have lots and lots of duplicates (a
 ~1.5GB table that contains the same number of distinct elements as a
 ~10MB table once did), and removing the duplicate protection for
 btrees, on top of everything else, sees the qsort do an awful lot
 better than HEAD does with the psuedo-protection.

 Does that also win vs. unpatched master?  If so we may as well pull
 that part out and commit it separately.

I didn't bother isolating that, because it doesn't really make sense
to (not having it is probably only of particular value when doing what
I'm doing anyway, but who knows). Go ahead and commit something to
remove that code (get it in both comparetup_index_btree and
comparetup_index_hash), as well as the tuple1 != tuple2 test now if
you like. It's patently clear that it is unnecessary, and so doesn't
have to be justified as a performance enhancement - it's a simple case
of refactoring for clarity. As I've said, we don't do this for heap
tuples and we've heard no complaints in all that time. It was added in
commit fbac1272b89b547dbaacd78bbe8da68e5493cbda, presumably when
problems with system qsorts came to light.

 I'd imagined that it might work a bit like default_statistics_target.
 Of course, I was just thinking out loud. Also, we do a very good job
 on *perfectly* pre-sorted input because we check for pre-sorted input.

 We occasionally have people who want to do SELECT * FROM foo WHERE a 
 100 AND a  200 ORDER BY a, b where foo has an index on (a) but not
 (a, b).  This tends to suck, because we fail to realize that we can
 batch the sort.  We either seqscan and filter the table then sort the
 results, or else we scan the index on (a) and then ignore the fact
 that we have data which is already partially sorted.  It's
 particularly annoying if a is mostly unique and needs b only
 occasionally to break ties.  It would be nice to improve this, but it
 may be more of a planner/executor problem that something we can fix
 directly inside the sort implementation.

That sounds like an interesting problem. Something to look into for
9.3, perhaps.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgsphere

2012-01-06 Thread Dave Cramer
I've been asked by someone to support pgshpere.

It would appear that the two project owners are MIA. If anyone knows
different can they let me know ?

Does anyone have any objection to me taking over the project?


Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-06 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 I didn't bother isolating that, because it doesn't really make sense
 to (not having it is probably only of particular value when doing what
 I'm doing anyway, but who knows). Go ahead and commit something to
 remove that code (get it in both comparetup_index_btree and
 comparetup_index_hash), as well as the tuple1 != tuple2 test now if
 you like. It's patently clear that it is unnecessary, and so doesn't
 have to be justified as a performance enhancement - it's a simple case
 of refactoring for clarity. As I've said, we don't do this for heap
 tuples and we've heard no complaints in all that time. It was added in
 commit fbac1272b89b547dbaacd78bbe8da68e5493cbda, presumably when
 problems with system qsorts came to light.

Actually, I'm going to object to reverting that commit, as I believe
that having equal-keyed index entries in physical table order may offer
some performance benefits at access time.  If you don't like the
comment, we can change that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.

2012-01-06 Thread Alex Hunsaker
On Fri, Jan 6, 2012 at 06:34, Andrew Dunstan and...@dunslane.net wrote:

 PFA that copies if its readonly and its not a scalar.

 I didn't bother adding regression tests-- should I have?

 I have several questions.

 1. How much are we actually saving here? newSVsv() ought to be pretty cheap,
 no? I imagine it's pretty heavily used inside the interpreter.

It will incur an extra copy for every return value from plperl and
every value passed to a spi function (and possibly other areas I
forgot about). Personally I think avoiding yet another copy of the
return value is worth it.

 2. Unless I'm insufficiently caffeinated right now, there's something wrong
 with this logic:

 !       if (SvREADONLY(sv) 
 !                       (type != SVt_IV ||
 !                       type != SVt_NV ||
 !                       type != SVt_PV))

Oh my... I dunno exactly what I was smoking last night, but its a good
thing I didn't share :-). Uh so my test program was also completely
wrong, Ill have to redo it all. I've narrowed it down to:
if ((type == SVt_PVGV || SvREADONLY(sv)))
{
if (type != SVt_PV 
type != SVt_NV)
{
sv = newSVsv(sv);
}
   }

One odd thing is we have to copy SVt_IV (plain numbers) as apparently
that is shared with refs (my $str = 's'; my $ref = \$str;).

Given this, #3 and the other unknowns we might run into in the future
I think if ((SvREADONLY(sv) || type == SVt_GVPV) proposed upthread
makes the most sense.

 3. The above is in any case almost certainly insufficient, because in my
 tests a typeglob didn't trigger SvREADONLY(), but did cause a crash.

Hrm the glob I was testing was *STDIN. It failed to fail in my test
program because I was not testing *STDIN directly but instead had
@test = (*STDIN); Ugh. Playing with it a bit more it seems only
*STDIN, *STDERR and *STDOUT have problems. For example this seems to
work fine for me:
do LANGUAGE plperlu $$ open(my $fh, '', '/tmp/t.tmp'); elog(NOTICE, $fh) $$;

 And yes, we should possibly add a regression test or two. Of course, we
 can't use the cause of the original complaint ($^V) in them, though.

I poked around  the perl source looking for some candidates elog(INFO,
${^TAINT}) works fine on 5.8.9 and 5.14.2. I thought we could get away
with elog(INFO, *STDIN) but 5.8.9 says:
NOTICE:
While other version of perl (5.14) say:
NOTICE:  *main::STDIN

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.

2012-01-06 Thread Andrew Dunstan



On 01/06/2012 02:02 PM, Alex Hunsaker wrote:

3. The above is in any case almost certainly insufficient, because in my
tests a typeglob didn't trigger SvREADONLY(), but did cause a crash.

Hrm the glob I was testing was *STDIN. It failed to fail in my test
program because I was not testing *STDIN directly but instead had
@test = (*STDIN); Ugh. Playing with it a bit more it seems only
*STDIN, *STDERR and *STDOUT have problems. For example this seems to
work fine for me:
do LANGUAGE plperlu $$ open(my $fh, '', '/tmp/t.tmp'); elog(NOTICE, $fh) $$;




$fh isn't a typeglob here, AIUI. But it's definitely not just *STDIN and 
friends. Try:


do LANGUAGE plperlu $$ $foo=1; elog(NOTICE, *foo) $$;


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP(!) Double Writes

2012-01-06 Thread David Fetter
On Wed, Jan 04, 2012 at 10:19:16PM -0800, David Fetter wrote:
 Folks,
 
 Please find attached two patches, each under the PostgreSQL license,
 one which implements page checksums vs. REL9_0_STABLE, the other which
 depends on the first (i.e. requires that it be applied first) and
 implements double writes.  They're vs.  REL9_0_STABLE because they're
 extracted from vPostgres 1.0, a proprietary product currently based on
 PostgreSQL 9.0.
 
 I had wanted the first patch set to be:
 
 - Against git head, and
 - Based on feedback from Simon's patch.

Simon's now given some feedback during a fruitful discussion, and has
sent an updated checksum patch which will be the basis for the
double-write stuff Dan's working on.

Stay tuned!

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-06 Thread Peter Geoghegan
On 6 January 2012 18:45, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Geoghegan pe...@2ndquadrant.com writes:
 I didn't bother isolating that, because it doesn't really make sense
 to (not having it is probably only of particular value when doing what
 I'm doing anyway, but who knows). Go ahead and commit something to
 remove that code (get it in both comparetup_index_btree and
 comparetup_index_hash), as well as the tuple1 != tuple2 test now if
 you like. It's patently clear that it is unnecessary, and so doesn't
 have to be justified as a performance enhancement - it's a simple case
 of refactoring for clarity. As I've said, we don't do this for heap
 tuples and we've heard no complaints in all that time. It was added in
 commit fbac1272b89b547dbaacd78bbe8da68e5493cbda, presumably when
 problems with system qsorts came to light.

 Actually, I'm going to object to reverting that commit, as I believe
 that having equal-keyed index entries in physical table order may offer
 some performance benefits at access time.  If you don't like the
 comment, we can change that.

Please explain your position. When is this supposed to be useful? Even
if you can present an argument for keeping it, it has to weigh the
fact that people don't tend to have much use for indexes with lots of
duplicates anyway. Prior to 2004, we did not do this - it was
justified as insurance against bad qsort implementations only.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Andres Freund
On Friday, January 06, 2012 07:26:14 PM Simon Riggs wrote:
 The following patch (v4) introduces a new WAL record type that writes
 backup blocks for the first hint on a block in any checkpoint that has
 not previously been changed. IMHO this fixes the torn page problem
 correctly, though at some additional loss of performance but not the
 total catastrophe some people had imagined. Specifically we don't need
 to log anywhere near 100% of hint bit settings, much more like 20-30%
 (estimated not measured).
Well, but it will hurt in those cases where hint bits already hurt hugely. 
Which is for example when accessing huge amounts of data the first time after 
loading it. Which is not exactly uncommon...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-06 Thread Robert Haas
On Fri, Jan 6, 2012 at 11:05 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 After thinking about this a bit, I think the problem is that the
 divisor we picked is still too high.  Suppose we set num_clog_buffers
 = (shared_buffers / 4MB), with a minimum of 4 and maximum of 32.

 Works for me.

Done.  I tested this on my MacBook Pro and I see no statistically
significant difference from the change on a couple of small pgbench
tests.  Hopefully that means this is good on large boxes and at worst
harmless on small ones.

As far as I can see, the trade-off is this: If you increase the number
of CLOG buffers, then your CLOG miss rate will go down.  On the other
hand, the cost of looking up a CLOG buffer will go up.  At some point,
the reduction in the miss rate will not be enough to pay for a longer
linear search - which also means holding CLogControlLock.  I think
it'd probably be worthwhile to think about looking for something
slightly smarter than a linear search at some point, and maybe also
looking for a way to partition the locking better.  But, this at least
picks the available load-hanging fruit, which is a good place to
start.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Heikki Linnakangas

On 06.01.2012 20:26, Simon Riggs wrote:

The following patch (v4) introduces a new WAL record type that writes
backup blocks for the first hint on a block in any checkpoint that has
not previously been changed. IMHO this fixes the torn page problem
correctly, though at some additional loss of performance but not the
total catastrophe some people had imagined. Specifically we don't need
to log anywhere near 100% of hint bit settings, much more like 20-30%
(estimated not measured).


How's that going to work during recovery? Like in hot standby.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Andres Freund
On Friday, January 06, 2012 08:45:45 PM Heikki Linnakangas wrote:
 On 06.01.2012 20:26, Simon Riggs wrote:
  The following patch (v4) introduces a new WAL record type that writes
  backup blocks for the first hint on a block in any checkpoint that has
  not previously been changed. IMHO this fixes the torn page problem
  correctly, though at some additional loss of performance but not the
  total catastrophe some people had imagined. Specifically we don't need
  to log anywhere near 100% of hint bit settings, much more like 20-30%
  (estimated not measured).
 
 How's that going to work during recovery? Like in hot standby.
How's recovery a problem? Unless I miss something that doesn't actually 
introduce a new possibility to transport hint bits to the standby (think 
fpw's). A new transport will obviously increase traffic but ...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Robert Haas
On Fri, Jan 6, 2012 at 2:35 PM, Andres Freund and...@anarazel.de wrote:
 On Friday, January 06, 2012 07:26:14 PM Simon Riggs wrote:
 The following patch (v4) introduces a new WAL record type that writes
 backup blocks for the first hint on a block in any checkpoint that has
 not previously been changed. IMHO this fixes the torn page problem
 correctly, though at some additional loss of performance but not the
 total catastrophe some people had imagined. Specifically we don't need
 to log anywhere near 100% of hint bit settings, much more like 20-30%
 (estimated not measured).
 Well, but it will hurt in those cases where hint bits already hurt hugely.
 Which is for example when accessing huge amounts of data the first time after
 loading it. Which is not exactly uncommon...

No, but I'd rather see us commit a checksum patch that sometimes
carries a major performance penalty and then work to reduce that
penalty later than never commit anything at all.  I think it's
unrealistic to assume we're going to get this perfectly right in one
try.  I am not sure whether this particular patch is close enough for
government work or still needs more hacking, and it may well be the
latter, but there is no use holding our breath and waiting for
absolute perfection.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Robert Haas
On Fri, Jan 6, 2012 at 2:48 PM, Andres Freund and...@anarazel.de wrote:
 On Friday, January 06, 2012 08:45:45 PM Heikki Linnakangas wrote:
 On 06.01.2012 20:26, Simon Riggs wrote:
  The following patch (v4) introduces a new WAL record type that writes
  backup blocks for the first hint on a block in any checkpoint that has
  not previously been changed. IMHO this fixes the torn page problem
  correctly, though at some additional loss of performance but not the
  total catastrophe some people had imagined. Specifically we don't need
  to log anywhere near 100% of hint bit settings, much more like 20-30%
  (estimated not measured).

 How's that going to work during recovery? Like in hot standby.
 How's recovery a problem? Unless I miss something that doesn't actually
 introduce a new possibility to transport hint bits to the standby (think
 fpw's). A new transport will obviously increase traffic but ...

The standby can set hint bits locally that weren't set on the data it
received from the master.  This will require rechecksumming and
rewriting the page, but obviously we can't write the WAL records
needed to protect those writes during recovery.  So a crash could
create a torn page, invalidating the checksum.

Ignoring checksum errors during Hot Standby operation doesn't fix it,
either, because eventually you might want to promote the standby, and
the checksum will still be invalid.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Andres Freund
On Friday, January 06, 2012 08:53:38 PM Robert Haas wrote:
 On Fri, Jan 6, 2012 at 2:48 PM, Andres Freund and...@anarazel.de wrote:
  On Friday, January 06, 2012 08:45:45 PM Heikki Linnakangas wrote:
  On 06.01.2012 20:26, Simon Riggs wrote:
   The following patch (v4) introduces a new WAL record type that writes
   backup blocks for the first hint on a block in any checkpoint that has
   not previously been changed. IMHO this fixes the torn page problem
   correctly, though at some additional loss of performance but not the
   total catastrophe some people had imagined. Specifically we don't need
   to log anywhere near 100% of hint bit settings, much more like 20-30%
   (estimated not measured).
  
  How's that going to work during recovery? Like in hot standby.
  
  How's recovery a problem? Unless I miss something that doesn't actually
  introduce a new possibility to transport hint bits to the standby (think
  fpw's). A new transport will obviously increase traffic but ...
 
 The standby can set hint bits locally that weren't set on the data it
 received from the master.  This will require rechecksumming and
 rewriting the page, but obviously we can't write the WAL records
 needed to protect those writes during recovery.  So a crash could
 create a torn page, invalidating the checksum.
Err. Stupid me, thanks.

 Ignoring checksum errors during Hot Standby operation doesn't fix it,
 either, because eventually you might want to promote the standby, and
 the checksum will still be invalid.
Its funny. I have the feeling we all are missing a very obvious brilliant 
solution to this...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-01-06 Thread Noah Misch
Corrections:

On Thu, Dec 29, 2011 at 11:35:00AM -0500, Noah Misch wrote:
 On Wed, Nov 09, 2011 at 08:49:35PM +0400, Alexander Korotkov wrote:
  +  *We set s to be the estimated frequency of the K'th element in a 
  natural
  +  *language's frequency table, where K is the target number of 
  entries in
  +  *the MCELEM array. We assume that the distribution of element 
  frequencies
  +  *follows Zipf's law with an exponent of 1.
  +  *
  +  *Assuming Zipfian distribution, the frequency of the K'th 
  element is equal
  +  *to 1/(K * H(W)) where H(n) is 1/2 + 1/3 + ... + 1/n and W is 
  the number of
  +  *elements in the language.   Putting W as one million, we 
  get roughly
  +  *0.07/K. This gives s = 0.07/K.  We set epsilon = s/10, which 
  gives bucket
  +  *width w = K/0.007 and maximum expected hashtable size of about 
  1000 * K.
 
 These last two paragraphs, adapted from ts_typanalyze.c, assume natural
 language documents.  To what extent do these parameter choices remain sensible
 for arbitrary data such as users may place in arrays?  In any event, we need a
 different justification, even if it's just a hand-wavy justification.
 
 If I'm following this correctly, this choice of s makes the algorithm
 guaranteed to find only elements constituting = 7% of the input elements.
 Incidentally, isn't that far too high for natural language documents?  If the
 English word the only constitutes 7% of typical documents, then this s
 value would permit us to discard practically every word; we'd be left with
 words read while filling the last bucket and words that happened to repeat
 exceedingly often in the column.  I haven't tried to make a test case to
 observe this problem; am I missing something?  (This question is largely
 orthogonal to your patch.)

No, we'll find elements of frequency at least 0.07/(default_statistics_target
* 10) -- in the default configuration, 0.007%.  Also, ts_typanalyze() counts
the number of documents that contain one or more instances of each lexeme,
ignoring the number of appearances within each document.  The word the may
constitute 7% of a typical document, but it will appear at least once in
nearly 100% of documents.  Therefore, this s value is adequate even for the
pathological case of each document containing just one lexeme.

  +  *
  +  *Note: in the above discussion, s, epsilon, and f/N are in terms 
  of a
  +  *element's frequency as a fraction of all elements seen in the 
  input.
  +  *However, what we actually want to store in the finished 
  pg_statistic
  +  *entry is each element's frequency as a fraction of all rows 
  that it occurs
  +  *in. Elements might be repeated in the same array. Since 
  operators
  +  *@, , @ takes care only about element occurence itself and 
  not about
  +  *occurence count, function takes additional care about 
  uniqueness of
  +  *counting. Also we need to change the divisor from N to 
  nonnull_cnt to get
  +  *the number we want.
 
 On the same tangent, why does ts_typanalyze() not deduplicate the same way?
 The @@ operator has the same property.

Answer: to_tsvector() will have already done so.

  +   /*
  +* We set bucket width equal to (num_mcelem + 10) / 0.007 as per the
  +* comment above.
  +*/
  +   bucket_width = num_mcelem * 1000 / 7;
 
 The addend mentioned is not present in the code or discussed in the comment
 above.  (I see the comment is copied verbatim from ts_typanalyze(), where the
 addend *is* present, though again the preceding comment says nothing of it.)

The addend rationale in fact does appear in the ts_typanalyze() comment.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Simon Riggs
On Fri, Jan 6, 2012 at 7:49 PM, Robert Haas robertmh...@gmail.com wrote:

 No, but I'd rather see us commit a checksum patch that sometimes
 carries a major performance penalty and then work to reduce that
 penalty later than never commit anything at all.  I think it's
 unrealistic to assume we're going to get this perfectly right in one
 try.  I am not sure whether this particular patch is close enough for
 government work or still needs more hacking, and it may well be the
 latter, but there is no use holding our breath and waiting for
 absolute perfection.

Well said. My view completely.

I do want jam tomorrow, I just want bread and butter today as well.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.

2012-01-06 Thread Tom Lane
Alex Hunsaker bada...@gmail.com writes:
 Oh my... I dunno exactly what I was smoking last night, but its a good
 thing I didn't share :-). Uh so my test program was also completely
 wrong, Ill have to redo it all. I've narrowed it down to:
 if ((type == SVt_PVGV || SvREADONLY(sv)))
 {
 if (type != SVt_PV 
 type != SVt_NV)
 {
 sv = newSVsv(sv);
 }
}

Has anyone tried looking at the source code for SvPVutf8 to see exactly
what cases it fails on?  The fact that there's an explicit croak() call
makes me think it might not be terribly hard to tell.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-06 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 On 6 January 2012 18:45, Tom Lane t...@sss.pgh.pa.us wrote:
 Actually, I'm going to object to reverting that commit, as I believe
 that having equal-keyed index entries in physical table order may offer
 some performance benefits at access time.  If you don't like the
 comment, we can change that.

 Please explain your position. When is this supposed to be useful?

When there are lots of duplicates of a particular indexed value, the
existing code will cause an indexscan to search them in physical order,
whereas if we remove the existing logic it'll be random --- in
particular, accesses to the same heap page can no longer be expected to
be clustered.

Admittedly, I don't have any numbers quantifying just how useful that
might be, but on the other hand you've not presented any evidence
justifying removing the behavior either.  If we believe your position
that indexes don't generally have lots of duplicates, then the code in
question will seldom be reached and therefore there would be no
performance benefit in removing it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Robert Haas
On Fri, Jan 6, 2012 at 3:47 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Since we ignore hints during HS anyway,

No, we don't.  We need to ignore visibility map bits, but we need not
and do not ignore HEAP_XMIN_COMMITTED, HEAP_XMAX_COMMITTED, etc.

 not setting them seems OK if
 checksums defined. Or we can recommend that you don't use checksums on
 a standby. Whichever fits.

 Writing pages during recovery doesn't need WAL. If we crash, we replay
 using the already generated WAL.

Which is all fine, except when you start making changes that are not
WAL-logged.  Then, you have the same torn page problem that exists
when you it in normal running.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-06 Thread Robert Haas
On Fri, Jan 6, 2012 at 4:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Admittedly, I don't have any numbers quantifying just how useful that
 might be, but on the other hand you've not presented any evidence
 justifying removing the behavior either.  If we believe your position
 that indexes don't generally have lots of duplicates, then the code in
 question will seldom be reached and therefore there would be no
 performance benefit in removing it.

Obviously, many indexes are unique and thus won't have duplicates at
all.  But if someone creates an index and doesn't make it unique, odds
are very high that it has some duplicates.  Not sure how many we
typically expect to see, but more than zero...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add SPI results constants available for PL/*

2012-01-06 Thread Martijn van Oosterhout
On Tue, Jan 03, 2012 at 09:11:17PM -0500, Andrew Dunstan wrote:
 Yeah, I'm with you and Pavel. Here's my quick perl one-liner to
 produce a set of SPI_* constants for pl/perl. I'm looking at the
 best way to include this in the bootstrap code.
 
perl -ne 'BEGIN { print use constant\n{\n; } END { print };\n; }
print \t$1 = $2,\n if /#define (SPI_\S+)\s+\(?(-?\d+)\)?/;'
src/include/executor/spi.h

Well, there's also h2ph, but that may be overkill...

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Merlin Moncure
On Fri, Jan 6, 2012 at 2:03 PM, Andres Freund and...@anarazel.de wrote:
 The standby can set hint bits locally that weren't set on the data it
 received from the master.  This will require rechecksumming and
 rewriting the page, but obviously we can't write the WAL records
 needed to protect those writes during recovery.  So a crash could
 create a torn page, invalidating the checksum.
 Err. Stupid me, thanks.

 Ignoring checksum errors during Hot Standby operation doesn't fix it,
 either, because eventually you might want to promote the standby, and
 the checksum will still be invalid.
 Its funny. I have the feeling we all are missing a very obvious brilliant
 solution to this...

Like getting rid of hint bits?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] LWLOCK_STATS

2012-01-06 Thread Robert Haas
It's been a while since I did any testing with LWLOCK_STATS defined,
so I thought it might be about time to do that again and see how
things look.  Here's how they looked back in July:

http://archives.postgresql.org/pgsql-hackers/2011-07/msg01373.php

Here are the results from a test I ran today on latest sources, again
on Nate Boley's machine.  Five-minute pgbench run, scale factor 100,
permanent tables, my usual config settings.  Somewhat depressingly,
virtually all of the interesting activity still centers around the
same three locks that were problematic back then, which means that -
although overall performance has improved quite a bit - we've not
really delivered any knockout punches.  Here are the perpetrators:

lwlock 4: shacq 26160717 exacq 2690379 blk 1129763
lwlock 11: shacq 97074202 exacq 2699639 blk 1482737
lwlock 7: shacq 0 exacq 16522284 blk 2926957
grand total: shacq 225919534 exacq 77179954 blk 6218570

There is some change in how the contention is distributed.  Taking the
number of times a request for each lock blocked as a percentage of the
total number of lock acquisitions that blocked, we get this:

WALInsertLock - July 2011: 35%, January 2012: 47%
CLogControLock - July 2011: 23%, January 2012: 24%
ProcArrayLock - July 2011: 32%, January 2012: 18%

Since there's been some change to the test configuration over the last
six months, this has to be taken with a grain of salt, but in broad
strokes it makes sense given what's been done - ProcArrayLock
contention is down significantly (due to Pavan's patch, and followup
tweaks), and the other locks are under correspondingly more pressure.
We've done enough work on CLogControlLock (today's change, and Simon's
prior patch to wake up the WAL writer more aggressively and thus allow
hint bits to be set sooner) to allow it to keep pace, so it's only up
slightly, but we haven't done anything about WALInsertLock and it's
therefore grown from just over a third of the blocks to almost half.

The top locks in terms of number of shared acquisitions are
CLogControlLock, which accounts for 47% of the shared lock
acquisitions in the system all by itself, followed by ProcArrayLock,
which accounts for another 12%.  The buffer mapping locks make up
another 23% in total, with the busiest one having about 3.5x the
traffic of the least busy one. Even when these shared acquisitions are
mostly uncontended at the lwlock level, the spinlock can still be a
contention point, and thus these are possible future targets for
further reducing our synchronization overhead despite the fact that
(on this test) there's not much blocking on, say, the buffer mapping
locks.

Note that this fits in shared buffers; on a larger test case, there
would be much more blocking on the buffer mapping locks (and
presumably BufFreelistLock would be a big problem, too).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Intermittent regression test failures from index-only plan changes

2012-01-06 Thread Tom Lane
Another regression test failure that I've been seeing lately is a change
from index-only scan to seqscan in create_index, as for instance here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguardt=2012-01-02%2023%3A05%3A02
I've managed to duplicate and debug this one too.  What I find is
that the planner switches from preferring index-only scan to preferring
seqscan because normally it sees the table as being fully all-visible
after the immediately preceding VACUUM, but in the failure cases the
table is seen as having no all-visible pages.  That inflates the
estimated cost of an index-only scan by enough to make it look like a
loser.  The create_index test is usually running by itself at this point
(since the concurrently started create_view test is much shorter), but
I had autovacuum logging on and found that there was a concurrent
auto-ANALYZE on another table when the manual VACUUM was running.
So failure to set the all-visible flags is expected given that context.

This is a bit troublesome because, AFAICS, this means that every single
one of the fifty-odd regression test cases that expect to see an Index
Only Scan plan might transiently fail, if there happens to be a
background auto-ANALYZE running at just the moment that the previous
vacuum would've otherwise set the all-visible bits.  It might be that
all the other ones are safe in practice for various reasons, but even
if they are there's no guarantee that new regression tests added in
future will be reliable.

Background auto-VACUUMs shouldn't cause this problem because they don't
take snapshots, and ideally it'd be nice if auto-ANALYZE couldn't create
the issue either, but ANALYZE does need a snapshot so it's hard to see
how to avoid having it trip the all-visible logic.  Anybody have any
ideas?

If nothing else comes to mind I'll probably just remove the
sometimes-failing EXPLAIN test case, but I'm worried about the prospects
of future failures of the same ilk.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Simon Riggs
On Fri, Jan 6, 2012 at 7:53 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 6, 2012 at 2:48 PM, Andres Freund and...@anarazel.de wrote:
 On Friday, January 06, 2012 08:45:45 PM Heikki Linnakangas wrote:
 On 06.01.2012 20:26, Simon Riggs wrote:
  The following patch (v4) introduces a new WAL record type that writes
  backup blocks for the first hint on a block in any checkpoint that has
  not previously been changed. IMHO this fixes the torn page problem
  correctly, though at some additional loss of performance but not the
  total catastrophe some people had imagined. Specifically we don't need
  to log anywhere near 100% of hint bit settings, much more like 20-30%
  (estimated not measured).

 How's that going to work during recovery? Like in hot standby.
 How's recovery a problem? Unless I miss something that doesn't actually
 introduce a new possibility to transport hint bits to the standby (think
 fpw's). A new transport will obviously increase traffic but ...

 The standby can set hint bits locally that weren't set on the data it
 received from the master.  This will require rechecksumming and
 rewriting the page, but obviously we can't write the WAL records
 needed to protect those writes during recovery.  So a crash could
 create a torn page, invalidating the checksum.

 Ignoring checksum errors during Hot Standby operation doesn't fix it,
 either, because eventually you might want to promote the standby, and
 the checksum will still be invalid.

Since we ignore hints during HS anyway, not setting them seems OK if
checksums defined. Or we can recommend that you don't use checksums on
a standby. Whichever fits.

Writing pages during recovery doesn't need WAL. If we crash, we replay
using the already generated WAL.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Aidan Van Dyk
On Fri, Jan 6, 2012 at 5:17 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, Jan 6, 2012 at 2:03 PM, Andres Freund and...@anarazel.de wrote:
 The standby can set hint bits locally that weren't set on the data it
 received from the master.  This will require rechecksumming and
 rewriting the page, but obviously we can't write the WAL records
 needed to protect those writes during recovery.  So a crash could
 create a torn page, invalidating the checksum.
 Err. Stupid me, thanks.

 Ignoring checksum errors during Hot Standby operation doesn't fix it,
 either, because eventually you might want to promote the standby, and
 the checksum will still be invalid.
 Its funny. I have the feeling we all are missing a very obvious brilliant
 solution to this...

 Like getting rid of hint bits?

Or even just not bothering to consider them as making buffers dirty,
so the only writes are already protected by the double-write (WAL, or
if they get some DW outside of WAL).

I think I've said it before, but I'm guessing OLTP style database
rarely have pages written that are dirty that aren't covered by real
changes (so have the FPW anyways) and OLAP type generally freeze after
loads to avoid the hint-bit-write penalty too...

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Robert Haas
On Fri, Jan 6, 2012 at 5:17 PM, Merlin Moncure mmonc...@gmail.com wrote:
 Its funny. I have the feeling we all are missing a very obvious brilliant
 solution to this...

 Like getting rid of hint bits?

No.

First, that solution has been proposed before, and it crashed and
burned before.  You may as well propose getting rid of VACUUM.  In
each case, the supposed problem is in fact a cure for a much more
severe problem that would quickly make you long for the days when you
had the first one.  I think you (or someone else?) had a fairly
well-developed patch for blunting the impact of hint bit setting, and
that we might be able to do if you (or someone) wants to finish it.
But getting rid of them altogether isn't likely, not because
PostgreSQL developers are an intractable herd of mule-headed idiots
(although I have been known to be all of those things on occasion) but
because the performance characteristics demonstrably suck, as mostly
recently demonstrated by commit
4de82f7d7c50a81ec8e70e2cb0ab413ab9134c0b, which significantly improved
performance with synchronous_commit=off by shaving about an average of
one-tenth of one second off the time before hint bits could be set on
any given tuple.

Second, even if you did it, it wouldn't fix this problem, because hint
bits aren't the only changes we make without WAL logging.  In
particular, when an index scan runs across a tuple that turns out to
be dead, we kill the index tuple so that future scans don't need to
revisit it.  I haven't actually done any testing to measure how
significant this is, but I wouldn't bet on it not mattering.

I think it would be wonderful to come up with a design that either
blunted the effect of hint bits or eliminated them altogether, but the
idea that there's an obvious way forward there that we've simply
refused to pursue is, AFAICT, flat wrong.  Let's not get sidetracked
into talking about things that aren't going to change in time for 9.2
(and probably not 9.3, either, given that no one seems to be putting
any time into it) and aren't even feasible solutions to the problem at
hand (checksums) anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Aidan Van Dyk
On Fri, Jan 6, 2012 at 6:48 PM, Aidan Van Dyk ai...@highrise.ca wrote:

 I think I've said it before, but I'm guessing OLTP style database
 rarely have pages written that are dirty that aren't covered by real
 changes (so have the FPW anyways) and OLAP type generally freeze after
 loads to avoid the hint-bit-write penalty too...

But ya, again, I've never measured ;-)


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] patch: fix SSI finished list corruption

2012-01-06 Thread Dan Ports
There's a corner case in the SSI cleanup code that isn't handled
correctly. It can arise when running workloads that are comprised
mostly (but not 100%) of READ ONLY transactions, and can corrupt the
finished SERIALIZABLEXACT list, potentially causing a segfault. The
attached patch fixes it.

Specifically, when the only remaining active transactions are READ
ONLY, we do a partial cleanup of committed transactions because
certain types of conflicts aren't possible anymore. For committed r/w
transactions, we release the SIREAD locks but keep the
SERIALIZABLEXACT. However, for committed r/o transactions, we can go
further and release the SERIALIZABLEXACT too. The problem was with the
latter case: we were returning the SERIALIZABLEXACT to the free list
without removing it from the finished list.

The only real change in the patch is the SHMQueueDelete line, but I
also reworked some of the surrounding code to make it obvious that r/o
and r/w transactions are handled differently -- the existing code felt
a bit too clever.

Dan

-- 
Dan R. K. Ports  MIT CSAILhttp://drkp.net/
From 39f8462332f998d7363058adabac412c7654befe Mon Sep 17 00:00:00 2001
From: Dan R. K. Ports d...@csail.mit.edu
Date: Thu, 29 Dec 2011 23:11:35 -0500
Subject: [PATCH] Read-only SERIALIZABLEXACTs are completely released when
 doing partial cleanup, so remove them from the finished
 list. This prevents the finished list from being corrupted.
 Also make it more clear that read-only transactions are
 treated differently here.

---
 src/backend/storage/lmgr/predicate.c |   26 +++---
 1 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index aefa698..c0b3cb4 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -3531,10 +3531,29 @@ ClearOldPredicateLocks(void)
 		else if (finishedSxact-commitSeqNo  PredXact-HavePartialClearedThrough
 		finishedSxact-commitSeqNo = PredXact-CanPartialClearThrough)
 		{
+			/*
+			 * Any active transactions that took their snapshot before
+			 * this transaction committed are read-only, so we can
+			 * clear part of its state.
+			 */
 			LWLockRelease(SerializableXactHashLock);
-			ReleaseOneSerializableXact(finishedSxact,
-	   !SxactIsReadOnly(finishedSxact),
-	   false);
+
+			if (SxactIsReadOnly(finishedSxact))
+			{
+/* A read-only transaction can be removed entirely */
+SHMQueueDelete((finishedSxact-finishedLink));
+ReleaseOneSerializableXact(finishedSxact, false, false);
+			}
+			else
+			{
+/*
+ * A read-write transaction can only be partially
+ * cleared. We need to keep the SERIALIZABLEXACT but
+ * can release the SIREAD locks and conflicts in.
+ */
+ReleaseOneSerializableXact(finishedSxact, true, false);
+			}
+			
 			PredXact-HavePartialClearedThrough = finishedSxact-commitSeqNo;
 			LWLockAcquire(SerializableXactHashLock, LW_SHARED);
 		}
@@ -3640,6 +3659,7 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 
 	Assert(sxact != NULL);
 	Assert(SxactIsRolledBack(sxact) || SxactIsCommitted(sxact));
+	Assert(partial || !SxactIsOnFinishedList(sxact));
 	Assert(LWLockHeldByMe(SerializableFinishedListLock));
 
 	/*
-- 
1.7.8.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Second thoughts on CheckIndexCompatible() vs. operator families

2012-01-06 Thread Noah Misch
In 367bc426a1c22b9f6badb06cd41fc438fd034639, I introduced a
CheckIndexCompatible() that approves btree and hash indexes having changed to
a different operator class within the same operator family.  To make that
valid, I also tightened the operator family contracts for those access methods
to address casts.  However, I've found two retained formal hazards.  They're
boring and perhaps unlikely to harm real users, but I'd rather nail them down
just in case.


First, opclasses like array_ops have polymorphic opcintype.  Members of such
operator classes could, in general, change behavior arbitrarily in response to
get_fn_expr_argtype().  No core polymorphic operator family member does this.
Nor could they, for no core index access method sets fn_expr.  In the absence
of responses to my previous inquiry[1] on the topic, I'm taking the position
that the lack of fn_expr in these calls is an isolated implementation detail
that could change at any time.  Therefore, we should only preserve an index of
polymorphic operator class when the old and new opcintype match exactly.  This
patch's test suite addition illustrates one ALTER TABLE ALTER TYPE affected by
this new restriction: a conversion from an array type to a domain over that
array type will now require an index rebuild.


Second, as a thought experiment, consider a database with these three types:
1. int4, the core type.
2. int4neg, stores to disk as the negation of its logical value.  Includes a
complete set of operators in the integer_ops btree family, but defines no
casts to other integer_ops-represented types.
3. int4other, stores to disk like int4.  No operators.  Has a implicit binary
coercion cast to int4 and an explicit binary coercion cast to int4neg.

Suppose a table in this database has a column of type int4other with an index
of default operator class.  By virtue of the implicit binary coercion and lack
of other candidates, the index will use int4_ops.  Now, change the type of
that column from int4other to int4neg.  The operator class changes to
int4neg_ops, still within the integer_ops family, and we do not rebuild the
index.  However, the logical sign of each value just flipped, making the index
invalid.  Where did CheckIndexCompatible() miss?  An operator family only
promises cast-compatibility when there exists an implicit or binary coercion
cast between the types in question.  There's no int4-int4neg cast here, not
even a multiple-step pathway.  CheckIndexCompatible() assumes that our ability
to perform a no-rewrite ALTER TABLE ALTER TYPE implies the existence of such a
cast.  It does imply that for the before-and-after _table column types_, but
it's the before-and-after _opcintype_ that matter here.

I think we could close this hazard by having CheckIndexCompatible() test for
an actual implicit or binary coercion cast between the opcintype of the old
and new operator classes.  However, I'm not confident to say that no similar
problem would remain.  Therefore, I propose ceasing to optimize intra-family
index operator transitions and simply requiring exact matches of operator
classes and exclusion operators.  This does not remove any actual optimization
for changes among core types, and it will be simpler to validate.  (I designed
the current rules under the misapprehension that varchar_ops was the default
operator class for varchar columns.  However, varchar_ops is a copy of
text_ops apart from the name and being non-default.  We ship no operator with
a varchar operand, relying instead on binary coercion to text.)

This patch does not, however, remove the new terms from the operator family
contracts.  While core PostgreSQL will no longer depend on them, they may
again prove handy for future optimizations like this.  I remain of the opinion
that they're already widely (perhaps even universally) obeyed.


This patch conflicts trivially with my patch Avoid FK validations for
no-rewrite ALTER TABLE ALTER TYPE in that they both add test cases to the
same location in alter_table.sql.  They're otherwise independent, albeit
reflecting parallel principles.

Thanks,
nm

[1] 
http://archives.postgresql.org/message-id/20111229211711.ga8...@tornado.leadboat.com
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 712b0b0..1bf1de5 100644
*** a/src/backend/commands/indexcmds.c
--- b/src/backend/commands/indexcmds.c
***
*** 23,28 
--- 23,29 
  #include catalog/pg_opclass.h
  #include catalog/pg_opfamily.h
  #include catalog/pg_tablespace.h
+ #include catalog/pg_type.h
  #include commands/dbcommands.h
  #include commands/defrem.h
  #include commands/tablecmds.h
***
*** 52,57 
--- 53,59 
  /* non-export function prototypes */
  static void CheckPredicate(Expr *predicate);
  static void ComputeIndexAttrs(IndexInfo *indexInfo,
+ Oid *typeOidP,
  Oid *collationOidP,
  Oid *classOidP,
 

Re: [HACKERS] LWLOCK_STATS

2012-01-06 Thread Jeff Janes
On Fri, Jan 6, 2012 at 2:24 PM, Robert Haas robertmh...@gmail.com wrote:
 It's been a while since I did any testing with LWLOCK_STATS defined,
 so I thought it might be about time to do that again and see how
 things look.  Here's how they looked back in July:

 http://archives.postgresql.org/pgsql-hackers/2011-07/msg01373.php

 Here are the results from a test I ran today on latest sources, again
 on Nate Boley's machine.  Five-minute pgbench run, scale factor 100,
 permanent tables, my usual config settings.

What was the tps/or and number of transactions?

I assume this was -c80 -j40?

Also, do you know what percent of CPU time was spend idle during the test?

If the very little time is spend sleeping on lwlocks (i.e. CPU time
near 100%), it doesn't matter much how that waiting is distributed.

Also, was there a big difference in tps between LWLOCK_STATS defined
and not defined (i.e. the overhead of doing the accounting)?

 Somewhat depressingly,
 virtually all of the interesting activity still centers around the
 same three locks that were problematic back then, which means that -
 although overall performance has improved quite a bit - we've not
 really delivered any knockout punches.  Here are the perpetrators:

I don't think that is depressing at all.  Certain locks needs to exist
to protect certain things, and a benchmark which tests those things is
going to take those locks rather than some other set of locks.  X
times faster is still X times faster, even if the bottleneck hasn't
move to some other part of the code.


 but we haven't done anything about WALInsertLock and it's
 therefore grown from just over a third of the blocks to almost half.

But not all blocks are for the same length of time.  Do we know how
much time is spent blocking?  I've seen some code around that tries to
instrument that, but on my machine of the time it added a lot of
overhead so it couldn't be used effectively.  I can try to dig it up
and update it to git-head if you want to try it and aren't already
aware of it.  (My code was based on something I found somewhere in
this list.)

Also, I assume this is without the recent Moving more work outside
WALInsertLock applied?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLOCK_STATS

2012-01-06 Thread Robert Haas
On Fri, Jan 6, 2012 at 9:29 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Fri, Jan 6, 2012 at 2:24 PM, Robert Haas robertmh...@gmail.com wrote:
 It's been a while since I did any testing with LWLOCK_STATS defined,
 so I thought it might be about time to do that again and see how
 things look.  Here's how they looked back in July:

 http://archives.postgresql.org/pgsql-hackers/2011-07/msg01373.php

 Here are the results from a test I ran today on latest sources, again
 on Nate Boley's machine.  Five-minute pgbench run, scale factor 100,
 permanent tables, my usual config settings.

 What was the tps/or and number of transactions?

 I assume this was -c80 -j40?

Sorry.  -c 32 -j 32.  tps was 9-10k, but I don't take it too seriously
with LWLOCK_STATS defined, because it has some performance impact.

 Also, do you know what percent of CPU time was spend idle during the test?

Sorry, no.

 If the very little time is spend sleeping on lwlocks (i.e. CPU time
 near 100%), it doesn't matter much how that waiting is distributed.

Well, clearly, there is clearly a pretty big impact, because unlogged
tables are much faster than regular tables.  See for example:

http://archives.postgresql.org/pgsql-hackers/2011-12/msg00095.php

...where the comparable result on slightly older sources are:

8 CLOG buffers, permanent tables: tps = 10025.079556 (including
connections establishing)
32 CLOG buffers, permanent tables: tps = 11247.358762 (including
connections establishing)
8 CLOG buffers, unlogged tables: tps = 16999.301828 (including
connections establishing)
32 CLOG buffers, permanent tables: tps = 19653.023856 (including
connections establishing)

As of today, you get 32 CLOG buffers without patching the source code.
 That test was also done before commits
d573e239f03506920938bf0be56c868d9c3416da and
0d76b60db4684d3487223b003833828fe9655fe2, which further optimized
ProcArrayLock.

 Also, was there a big difference in tps between LWLOCK_STATS defined
 and not defined (i.e. the overhead of doing the accounting)?

Yes, see notes above.

 Somewhat depressingly,
 virtually all of the interesting activity still centers around the
 same three locks that were problematic back then, which means that -
 although overall performance has improved quite a bit - we've not
 really delivered any knockout punches.  Here are the perpetrators:

 I don't think that is depressing at all.  Certain locks needs to exist
 to protect certain things, and a benchmark which tests those things is
 going to take those locks rather than some other set of locks.  X
 times faster is still X times faster, even if the bottleneck hasn't
 move to some other part of the code.

True.  What I don't like is: I think we've really only pushed the
bottleneck out a few cores.  Throw a 64-core machine at it and we're
going to have all these same problems over again.  I'd like to find
solutions that change the dynamic in a more fundamental way, so that
we buy a little more.  But I'm not going to complain too much; the
performance gains we've gotten with these techniques are obviously
quite substantial, even though they're not a total solution.

 but we haven't done anything about WALInsertLock and it's
 therefore grown from just over a third of the blocks to almost half.

 But not all blocks are for the same length of time.  Do we know how
 much time is spent blocking?  I've seen some code around that tries to
 instrument that, but on my machine of the time it added a lot of
 overhead so it couldn't be used effectively.  I can try to dig it up
 and update it to git-head if you want to try it and aren't already
 aware of it.  (My code was based on something I found somewhere in
 this list.)

I haven't tried it for reasons of overhead, but I'm aware of the problem.

 Also, I assume this is without the recent Moving more work outside
 WALInsertLock applied?

Right.  If we can get that done for 9.2, we'll be cooking with gas -
on my tests that was a big improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] Fix Leaky View Problem

2012-01-06 Thread Robert Haas
On Tue, Jan 3, 2012 at 12:11 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2011/12/23 Robert Haas robertmh...@gmail.com:
 On Fri, Dec 23, 2011 at 5:56 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 I'd like the regression test on select_view test being committed also
 to detect unexpected changed in the future. How about it?

 Can you resend that as a separate patch?  I remember there were some
 things I didn't like about it, but I don't remember what they were at
 the moment...

 Sorry for this late response.

 The attached one is patch of the regression test that checks scenario
 of malicious function with/without security_barrier option.

 I guess you concerned about that expected/select_views_1.out is
 patched, not expected/select_views.out.
 I'm not sure the reason why regression test script tries to make diff
 between results/select_views and expected/select_views_1.out.

select_views.out and select_views_1.out are alternate expected output
files.  The regression tests pass if the actual output matches either
one.  Thus, you have to patch both.

BTW, can you also resubmit the leakproof stuff as a separate patch for
the last CF?  Want to make sure we get that into 9.2, if at all
possible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions and 9.2

2012-01-06 Thread Robert Haas
On Fri, Dec 23, 2011 at 5:45 AM, Daniel Farina dan...@heroku.com wrote:
 On Wed, Dec 21, 2011 at 5:46 AM, Robert Haas robertmh...@gmail.com wrote:
 Assuming the command in
 question can be stuffed inside a function, the most you're gaining is
 a little notational convenience

 I can answer that one (why a full-blown mechanism for a notational 
 convenience).

 It has occurred to me to use this mechanism to support extensions, but
 I find the prospect of having to teach people special operators to
 understand how to use the standard extension feature an unstomachable
 prospect.  The single biggest problem is that pg_restore will not know
 to call this function rather than run CREATE EXTENSION, and then two
 databases, prepared exactly the same cannot be pg_dump-ed and restored
 in a reasonable way.  If there's a definable whitelist, then this
 vital functionality will work.

 There are other sicknesses imposed if one has to hack up how to
 delegate extension creation to non-superusers:

 * The postgres manual, wiki, and ecosystem of recipes on the web and
 internet at large basically are not going to work without
 modification.  Death by a thousand cuts.

 * \df in psql displays some new operators that you aren't familiar
 with.  Also, putting aside your pg_dump/pg_restore generated SQL will
 not work, they will look funny.  This is an eyesore.

 * If one tells someone else yeah, the system supports extensions,
 they're going to type CREATE EXTENSION.  And then it's not going to
 work, and then they're either going to give up, yak shave for a few
 hours figuring out what they were supposed to do for their provider
 or organization, or maybe worst of all hack their way around
 functionality the extension could have provided in a cleaner way had
 it just worked nice and tidy to begin with.

 I find this functionality basically vital because it greatly decreases
 the friction between people, organizations, and software in the domain
 of deploying, reasoning, and communicating about the installation and
 status of Postgres extensions in a database.

OK, I'll buy that.  I think we need to consider the design of the
mechanism carefully, though, or we'll end up with something messy and
inconvenient.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] broken link to PostgreSQL 9.1 repository for Fedora 14

2012-01-06 Thread Pavel Stehule
Hello

There is broken link on http://yum.postgresql.org/repopackages.php page

PostgreSQL 9.1 - Fedora 14 -
http://yum.postgresql.org/9.1/fedora/fedora-14-i386/pgdg-fedora-9.1-2.noarch.rpm
- 404 - Not Found

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLOCK_STATS

2012-01-06 Thread Heikki Linnakangas

On 07.01.2012 00:24, Robert Haas wrote:

It's been a while since I did any testing with LWLOCK_STATS defined,
so I thought it might be about time to do that again and see how
things look.  Here's how they looked back in July:

http://archives.postgresql.org/pgsql-hackers/2011-07/msg01373.php


Interesting.

A couple of weeks ago I wrote a little patch that's similar to 
LWLOCK_STATS, but it prints out % of wallclock time that is spent 
acquiring, releasing, or waiting for a lock. I find that more useful 
than the counters.


Here's the patch, I hope it's useful to others. It uses timer_create() 
and timer_settime(), so it probably won't work on all platforms, and 
requires linking with -lrt.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers