date:20120628

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Simon Riggs

On 27 June 2012 23:01, Robert Haas  wrote:

> 1. Are there any call sites from which this should be disabled, either
> because the optimization won't help, or because the caller is holding
> a lock that we don't want them sitting on for a moment longer than
> necessary?  I think the worst case is that we're doing something like
> splitting the root page of a btree, and now nobody can walk the btree
> until the flush finishes, and here we are doing an "unnecessary"
> sleep.  I am not sure where I can construct a workload where this is a
> real as opposed to a theoretical problem, though.  So maybe we should
> just not worry about it.  It suffices for this to be an improvement
> over the status quo; it doesn't have to be perfect.  Thoughts?

Let's put this in now, without too much fiddling. There is already a
GUC to disable it, so measurements can be made to see if this causes
any problems.

If there are problems, we fix them. We can't second guess everything.

> 2. Should we rename the GUCs, since this patch will cause them to
> control WAL flush in general, as opposed to commit specifically?
> Peter Geoghegan and Simon were arguing that we should retitle it to
> group_commit_delay rather than just commit_delay, but that doesn't
> seem to be much of an improvement in describing what the new behavior
> will actually be, and I am doubtful that it is worth creating a naming
> incompatibility with previous releases for a cosmetic change.  I
> suggested wal_flush_delay, but there's no consensus on that.
> Opinions?

Again, leave the naming of that for later. The idea of a rename came
from yourself, IIRC.

> The XLByteLE test four lines from the bottom should happen before we
> consider whether to sleep, because it's possible that we'll discover
> that our flush request has already been satisfied and exit without
> doing anything, in which case the sleep is a waste.  The attached
> version adjusts the logic accordingly.

I thought there already was a test like that earlier in the flush.

I agree there needs to be one.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-28 Thread Daniel Farina

On Wed, Jun 27, 2012 at 7:00 PM, Josh Berkus  wrote:
> I've seen this at two sites now, and my conclusion is that a single
> autovacuum_max_workers isn't sufficient if to cover the case of
> wraparound vacuum.  Nor can we just single-thread the wraparound vacuum
> (i.e. just one worker) since that would hurt users who have thousands of
> small tables.

I have also witnessed very unfortunate un-smooth performance behavior
around wraparound time. It seems like a bit of adaptive response in
terms of allowed autovacuum throughput to number of pages requiring
wraparound vacuuming would be one load off my mind.  Getting slower
and slower gradually with some way to know that autovacuum has decided
it should work harder and harder is better than the brick wall that
can sneak up currently.

Count me as appreciative for improvements in this area.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_signal_backend() asymmetry

2012-06-28 Thread Daniel Farina

On Wed, Jun 27, 2012 at 5:38 PM, Josh Kupershmidt  wrote:
> Hi all,
>
> I have one nitpick related to the recent changes for
> pg_cancel_backend() and pg_terminate_backend(). If you use these
> functions as an unprivileged user, and try to signal a nonexistent
> PID, you get:

I think the goal there is to avoid leakage of the knowledge or
non-knowledge of a given PID existing once it is deemed out of
Postgres' control.  Although I don't have a specific attack vector in
mind for when one knows a PID exists a-priori, it does seem like an
unnecessary admission on the behalf of other programs.

Also, in pg_cancel_backend et al, PID really means "database session",
but as-is the marrying of PID and session is one of convenience, so I
think any message that communicates more than "that database session
does not exist" is superfluous anyhow.  Perhaps there is a better
wording for the time being that doesn't implicate the existence or
non-existence of the PID?

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch: Fix for a small tipo (space lost)

2012-06-28 Thread Alexander Lakhin


Fix for a small tipo (space lost)
>From be61035d21512324aafd41074511625d97d17256 Mon Sep 17 00:00:00 2001
From: Alexander Lakhin 
Date: Thu, 28 Jun 2012 12:10:25 +0400
Subject: Fix for a small tipo (space lost).

---
 src/backend/utils/misc/guc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 965d325..33b58de 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2843,7 +2843,7 @@ static struct config_string ConfigureNamesString[] =
 
 	{
 		{"event_source", PGC_POSTMASTER, LOGGING_WHERE,
-			gettext_noop("Sets the application name used to identify"
+			gettext_noop("Sets the application name used to identify "
 		 "PostgreSQL messages in the event log."),
 			NULL
 		},
-- 
1.7.9.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Magnus Hagander

On Thu, Jun 28, 2012 at 7:00 AM, Robert Haas  wrote:
> On Wed, Jun 27, 2012 at 9:44 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane  wrote:
 Would Posix shmem help with that at all?  Why did you choose not to
 use the Posix API, anyway?
>>
>>> It seemed more complicated.  If we use the POSIX API, we've got to
>>> have code to find a non-colliding name for the shm, and we've got to
>>> arrange to clean it up at process exit.  Anonymous shm doesn't require
>>> a name and goes away automatically when it's no longer in use.
>>
>> I see.  Those are pretty good reasons ...
>
> So, should we do it this way?
>
> I did a little research and discovered that Linux 2.3.51 (released
> 3/11/2000) apparently returns EINVAL for MAP_SHARED|MAP_ANONYMOUS.
> That combination is documented to work beginning in Linux 2.4.0.  How
> worried should we be about people trying to run PostgreSQL 9.3 on
> pre-2.4 kernels?  If we want to worry about it, we could try mapping a
> one-page shared MAP_SHARED|MAP_ANONYMOUS segment first.  If that
> works, we could assume that we have a working MAP_SHARED|MAP_ANONYMOUS
> facility and try to allocate the whole segment plus a minimal sysv
> shm.  If the single page allocation fails with EINVAL, we could fall
> back to allocating the entire segment as sysv shm.

Do we really need a runtime check for that? Isn't a configure check
enough? If they *do* deploy postgresql 9.3 on something that old,
they're building from source anyway...


> A related question is - if we do this - should we enable it only on
> ports where we've verified that it works, or should we just turn it on
> everywhere and fix breakage if/when it's reported?  I lean toward the
> latter.

Depends on the amount of expected breakage, but I'd lean towards teh
later as well.


> If we find that there are platforms where (a) mmap is not supported or
> (b) MAP_SHARED|MAP_ANON works but has the wrong semantics, we could
> either shut off this optimization on those platforms by fiat, or we
> could test not only that the call succeeds, but that it works
> properly: create a one-page mapping and fork a child process; in the
> child, write to the mapping and exit; in the parent, wait for the
> child to exit and then test that we can read back the correct
> contents.  This would protect against a hypothetical system where the
> flags are accepted but fail to produce the correct behavior.  I'm
> inclined to think this is over-engineering in the absence of evidence
> that there are platforms that work this way.

Could we actually turn *that* into a configure test, or will that be
too complex?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 7:05 AM, Magnus Hagander  wrote:
> Do we really need a runtime check for that? Isn't a configure check
> enough? If they *do* deploy postgresql 9.3 on something that old,
> they're building from source anyway...
[...]
>
> Could we actually turn *that* into a configure test, or will that be
> too complex?

I don't see why we *couldn't* make either of those things into a
configure test, but it seems more complicated than a runtime test and
less accurate, so I guess I'd be in favor of doing them at runtime or
not at all.

Actually, the try-a-one-page-mapping-and-see-if-you-get-EINVAL test is
so simple that I really can't see any reason not to insert that
defense.  The fork-and-check-whether-it-really-works test is probably
excess paranoia until we determine whether that's really a danger
anywhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Fix for a small tipo (space lost)

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 4:44 AM, Alexander Lakhin  wrote:
> Fix for a small tipo (space lost)

Thanks!

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Covering Indexes

2012-06-28 Thread David E. Wheeler

Hackers,

Very interesting design document for SQLite 4:

  http://www.sqlite.org/src4/doc/trunk/www/design.wiki

I'm particularly intrigued by "covering indexes". For example:

CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

This allows the following query to do an index-only scan:

SELECT c, d FROM table1 WHERE a=? AND b=?;

Now that we have index-only scans in 9.2, I'm wondering if it would make sense 
to add covering index support, too, where additional, unindexed columns are 
stored alongside indexed columns.

And I wonder if it would work well with expressions, too?

David




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 4:21 AM, Simon Riggs  wrote:
> Let's put this in now, without too much fiddling. There is already a
> GUC to disable it, so measurements can be made to see if this causes
> any problems.
>
> If there are problems, we fix them. We can't second guess everything.

Fair enough.

>> 2. Should we rename the GUCs, since this patch will cause them to
>> control WAL flush in general, as opposed to commit specifically?
>> Peter Geoghegan and Simon were arguing that we should retitle it to
>> group_commit_delay rather than just commit_delay, but that doesn't
>> seem to be much of an improvement in describing what the new behavior
>> will actually be, and I am doubtful that it is worth creating a naming
>> incompatibility with previous releases for a cosmetic change.  I
>> suggested wal_flush_delay, but there's no consensus on that.
>> Opinions?
>
> Again, leave the naming of that for later. The idea of a rename came
> from yourself, IIRC.

Deciding on a name is not such a hard thing that leaving it till later
solves any problem.  Let's just make a decision and be done with it.

>> The XLByteLE test four lines from the bottom should happen before we
>> consider whether to sleep, because it's possible that we'll discover
>> that our flush request has already been satisfied and exit without
>> doing anything, in which case the sleep is a waste.  The attached
>> version adjusts the logic accordingly.
>
> I thought there already was a test like that earlier in the flush.
>
> I agree there needs to be one.

There are several of them floating around; the point here is just to
make sure that the sleep is after all of them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 2:02 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> It's just ridiculous to assert that it doesn't matter if all the
>> anti-wraparound vacuums start simultaneously.  It does matter.  For
>> one thing, once every single autovacuum worker is pinned down doing an
>> anti-wraparound vacuum of some table, then a table that needs an
>> ordinary vacuum may have to wait quite some time before a worker is
>> available.
>
> Well, that's a fair point, but I don't think it has anything to do with
> Josh's complaint --- which AFAICT is about imposed load, not about
> failure to vacuum things that need vacuumed.  Any scheme you care to
> design will sometimes be running max_workers workers at once, and if
> that's too much load there will be trouble.  I grant that there can be
> value in a more complex strategy for when to schedule vacuuming
> activities, but I don't think that it has a lot to do with solving the
> present complaint.

I think it's got everything to do with it.  Josh could fix his problem
by increasing the cost limit and/or reducing the cost delay, but if he
did that then his database would get bloated...

>> Parallelism is not free, ever, and particularly not here, where it has
>> the potential to yank the disk head around between five different
>> files, seeking like crazy, instead of a nice sequential I/O pattern on
>> each file in turn.
>
> Interesting point.  Maybe what's going on here is that
> autovac_balance_cost() is wrong to suppose that N workers can each have
> 1/N of the I/O bandwidth that we'd consider okay for a single worker to
> eat.  Maybe extra seek costs mean we have to derate that curve by some
> large factor.  1/(N^2), perhaps?  I bet the nature of the disk subsystem
> affects this a lot, though.

...and this would have the same effect.  Let's not assume that the
problem is that Josh doesn't know how to make autovacuum less
aggressive, because I'm pretty sure that ain't the issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_signal_backend() asymmetry

2012-06-28 Thread Magnus Hagander

On Thu, Jun 28, 2012 at 10:36 AM, Daniel Farina  wrote:
> On Wed, Jun 27, 2012 at 5:38 PM, Josh Kupershmidt  wrote:
>> Hi all,
>>
>> I have one nitpick related to the recent changes for
>> pg_cancel_backend() and pg_terminate_backend(). If you use these
>> functions as an unprivileged user, and try to signal a nonexistent
>> PID, you get:
>
> I think the goal there is to avoid leakage of the knowledge or
> non-knowledge of a given PID existing once it is deemed out of
> Postgres' control.  Although I don't have a specific attack vector in
> mind for when one knows a PID exists a-priori, it does seem like an
> unnecessary admission on the behalf of other programs.

Well, there are two sides to it - one is the message, the other is if
it should be ERROR or WARNING. My reading is that Josh is suggesting
we make them WARNING in both cases, right?

It should be possible to make it a WARNING without leaking information
in the error message..


> Also, in pg_cancel_backend et al, PID really means "database session",
> but as-is the marrying of PID and session is one of convenience, so I
> think any message that communicates more than "that database session
> does not exist" is superfluous anyhow.  Perhaps there is a better
> wording for the time being that doesn't implicate the existence or
> non-existence of the PID?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Covering Indexes

2012-06-28 Thread Andreas Joseph Krogh


On 06/28/2012 02:16 PM, David E. Wheeler wrote:

Hackers,

Very interesting design document for SQLite 4:

   http://www.sqlite.org/src4/doc/trunk/www/design.wiki

I'm particularly intrigued by "covering indexes". For example:

 CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

This allows the following query to do an index-only scan:

 SELECT c, d FROM table1 WHERE a=? AND b=?;

Now that we have index-only scans in 9.2, I'm wondering if it would make sense 
to add covering index support, too, where additional, unindexed columns are 
stored alongside indexed columns.

And I wonder if it would work well with expressions, too?

David


This is analogous to SQL Server's "include" :

|CREATE NONCLUSTERED INDEX my_idx|
|ON my_table (status)|
|INCLUDE (someColumn, otherColumn)|

Which is useful, but bloats the index.

--
Andreas Joseph Krogh  - mob: +47 909 56 963
Senior Software Developer / CEO - OfficeNet AS - http://www.officenet.no
Public key: http://home.officenet.no/~andreak/public_key.asc

Re: [HACKERS] Covering Indexes

2012-06-28 Thread Rob Wultsch

On Thu, Jun 28, 2012 at 8:16 AM, David E. Wheeler  wrote:
> Hackers,
>
> Very interesting design document for SQLite 4:
>
>  http://www.sqlite.org/src4/doc/trunk/www/design.wiki
>
> I'm particularly intrigued by "covering indexes". For example:
>
>    CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);
>
> This allows the following query to do an index-only scan:
>
>    SELECT c, d FROM table1 WHERE a=? AND b=?;
>
> Now that we have index-only scans in 9.2, I'm wondering if it would make 
> sense to add covering index support, too, where additional, unindexed columns 
> are stored alongside indexed columns.
>
> And I wonder if it would work well with expressions, too?
>
> David

IRC MS SQL also allow unindexed columns in the index.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-28 Thread Cédric Villemain

> >> Parallelism is not free, ever, and particularly not here, where it has
> >> the potential to yank the disk head around between five different
> >> files, seeking like crazy, instead of a nice sequential I/O pattern on
> >> each file in turn.
> > 
> > Interesting point.  Maybe what's going on here is that
> > autovac_balance_cost() is wrong to suppose that N workers can each have
> > 1/N of the I/O bandwidth that we'd consider okay for a single worker to
> > eat.  Maybe extra seek costs mean we have to derate that curve by some
> > large factor.  1/(N^2), perhaps?  I bet the nature of the disk subsystem
> > affects this a lot, though.
> 
> ...and this would have the same effect.  Let's not assume that the
> problem is that Josh doesn't know how to make autovacuum less
> aggressive, because I'm pretty sure that ain't the issue.

we may need reserved workers to work on system tables, at least. 
Just as a protection in case all workers all locked hours walking 'log' 
tables. In the meantime, the pg_type table can bloat a lot for ex.

It might be that limiting the number of workers in 'antiwraparound-mode' to 
(max_workers - round(max_workers/3)) is enough. 

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] Event Triggers reduced, v1

2012-06-28 Thread Robert Haas

On Sun, Jun 24, 2012 at 5:46 PM, Dimitri Fontaine
 wrote:
> Here's an early revision 2 of the patch, not yet ready for commit, so
> including the PL stuff still. What's missing is mainly a cache reference
> leak to fix at DROP EVENT TRIGGER, but I failed to spot where it comes
> from.
>
> As I fixed about all the other comments I though I might as well send in
> the new version of the patch to get to another round of review.

Some of the pg_dump hunks fail to apply for me; I guess this needs to
be remerged.

Spurious hunk:

-query_hosts
+query_hosts

Spurious hunk:

- * need deep copies, so they should be copied via list_copy()
+ * need deep copies, so they should be copied via list_copy(const )

There are a few remaining references to EVTG_FIRED_BEFORE /
EVTG_FIRED_INSTEAD_OF which should be cleaned up.  I suggest writing
the grammar as CREATE EVENT TRIGGER name ON event_name EXECUTE...  On
a related note, evttype is still mentioned in the documentation, also.
 And CreateEventTrigStmt has a char timing field that can go.

Incidentally, why do we not support an argument list here, as with
ordinary triggers?

I think the below hunk should get removed.  Or else it should be
surrounded by #ifdef rather than commented out.

+   /*
+* Useful to raise WARNINGs for any DDL command not yet supported.
+*
+   elog(WARNING, "Command Tag:%s", tag);
+   elog(WARNING, "Note to String: %s", nodeToString(parsetree));
+*/

Typo:

+ * where we probide object name and namespace separately and still want nice

Typo:

+ * the easier code makes up fot it big time.

psql is now very confused about what the slash command is.  It's
actually implemented as \dy, but the comments say \dev in several
places, and slashUsage() documents it as \dct.

InitializeEvtTriggerCommandCache still needs work.  First, it ensures
that CacheMemoryContext is non-NULL... after it's already stored the
value of CacheMemoryContext into the HASHCTL.  Obviously, the order
there needs to be fixed.  Also, I think you need to split this into
two functions.  There should be one function that gets called just
once at startup time to CacheRegisterSyscacheCallback(), because we
don't want to redo that every time the cache gets blown away.  Then
there should be another function that gets called when, and only when,
someone needs to use the cache.  That should create and populate the
hash table.

I think that all event triggers should fire in exactly alphabetical
order by name.  Now that we have the concept of multiple firing
points, there's really no reason to distinguish between any triggers
and triggers on specific commands.  Eliminating that distinction will
eliminate a bunch of complexity while making things *more* flexible
for the user, who will be able to get a command trigger for a specific
command to fire either before or after an ANY command trigger he has
also defined rather than having the system enforce policy on him.

plperl_validator still makes reference to CMDTRIGGER.

Let's simplify this down to an enum with just one element, since
that's all we're going to support for starters, and remove the related
parser support for everything but command_start:

+typedef enum TrigEvent
+{
+   E_CommandStart   = 1,
+   E_SecurityCheck  = 10,
+   E_ConsistencyCheck   = 15,
+   E_NameLookup = 20,
+   E_CommandEnd = 51
+} TrigEvent;

I think we should similarly rip out the vestigial support for
supplying schema name, object name, and object ID.  That's not going
to be possible at command_start anyway; we can include that stuff in
the patch that adds a later firing point (command end, or consistency
check, perhaps).

I think standard_ProcessUtility should have a shortcut that bypasses
setting up the event context if there are no event triggers at all in
the entire system, so that we do no extra work unless there's some
potential benefit.

It seems to me that we probably need a CommandCounterIncrement() after
firing each event trigger, unless that's happening under the covers
somewhere and I'm missing it.  A good test case would be to have two
event triggers.  Have the first one insert a row into the table and
check that the second one can see the row there.  I suggest adding
something like this to the regression tests.

Instead of having giant switch statements match E_WhatEver tags to
strings and visca versa, I think it would be much better to create an
array someplace that contains all the mappings.  Then you can write a
convenience function that scans the array for a string and returns the
E_WhatEver tag, and another convenience function that scans the array
for an E_WhatEver tag and returns the corresponding string.  Then all
the knowledge of how those things map onto each other is in ONE place,
which should make things a whole lot easier in terms of future code
maintenance, not to mention minimizing the chances of bugs of
oversight in the patch as it stands.  :-)

The

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Jon Nelson

On Thu, Jun 28, 2012 at 6:05 AM, Magnus Hagander  wrote:
> On Thu, Jun 28, 2012 at 7:00 AM, Robert Haas  wrote:
>> On Wed, Jun 27, 2012 at 9:44 AM, Tom Lane  wrote:
>>> Robert Haas  writes:
 On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane  wrote:
> Would Posix shmem help with that at all?  Why did you choose not to
> use the Posix API, anyway?
>>>
 It seemed more complicated.  If we use the POSIX API, we've got to
 have code to find a non-colliding name for the shm, and we've got to
 arrange to clean it up at process exit.  Anonymous shm doesn't require
 a name and goes away automatically when it's no longer in use.
>>>
>>> I see.  Those are pretty good reasons ...
>>
>> So, should we do it this way?
>>
>> I did a little research and discovered that Linux 2.3.51 (released
>> 3/11/2000) apparently returns EINVAL for MAP_SHARED|MAP_ANONYMOUS.
>> That combination is documented to work beginning in Linux 2.4.0.  How
>> worried should we be about people trying to run PostgreSQL 9.3 on
>> pre-2.4 kernels?  If we want to worry about it, we could try mapping a
>> one-page shared MAP_SHARED|MAP_ANONYMOUS segment first.  If that
>> works, we could assume that we have a working MAP_SHARED|MAP_ANONYMOUS
>> facility and try to allocate the whole segment plus a minimal sysv
>> shm.  If the single page allocation fails with EINVAL, we could fall
>> back to allocating the entire segment as sysv shm.

Why not just mmap /dev/zero (MAP_SHARED but not MAP_ANONYMOUS)?  I
seem to think that's what I did when I needed this functionality oh so
many moons ago.

-- 
Jon

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_signal_backend() asymmetry

2012-06-28 Thread Noah Misch

On Thu, Jun 28, 2012 at 01:36:49AM -0700, Daniel Farina wrote:
> On Wed, Jun 27, 2012 at 5:38 PM, Josh Kupershmidt  wrote:
> > I have one nitpick related to the recent changes for
> > pg_cancel_backend() and pg_terminate_backend(). If you use these
> > functions as an unprivileged user, and try to signal a nonexistent
> > PID, you get:
> 
> I think the goal there is to avoid leakage of the knowledge or
> non-knowledge of a given PID existing once it is deemed out of
> Postgres' control.  Although I don't have a specific attack vector in
> mind for when one knows a PID exists a-priori, it does seem like an
> unnecessary admission on the behalf of other programs.

I think it was just an oversight.  I agree that these functions have no
business helping users probe for live non-PostgreSQL PIDs on the server, but
they don't do so and Josh's patch won't change that.  I recommend committing
the patch.  Users will be able to probe for live PostgreSQL PIDs, but
pg_stat_activity already provides those.

> Also, in pg_cancel_backend et al, PID really means "database session",
> but as-is the marrying of PID and session is one of convenience, so I
> think any message that communicates more than "that database session
> does not exist" is superfluous anyhow.  Perhaps there is a better
> wording for the time being that doesn't implicate the existence or
> non-existence of the PID?

Perhaps, though I'm not coming up with anything.  The message isn't wrong; the
value is a PID independent of whether some process has that PID.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] Add SPI_gettypmod() to return a field's typemod from a TupleDesc

2012-06-28 Thread Robert Haas

On Mon, Jun 18, 2012 at 3:29 PM, Amit Kapila  wrote:
> [ review ]

Chetan, this patch is waiting for an update from you.  If you'd like
this to get committed this CommitFest, we'll need an updated patch
soon.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-28 Thread Tom Lane

Robert Haas  writes:
> On Thu, Jun 28, 2012 at 2:02 AM, Tom Lane  wrote:
>> Well, that's a fair point, but I don't think it has anything to do with
>> Josh's complaint --- which AFAICT is about imposed load, not about
>> failure to vacuum things that need vacuumed.

> I think it's got everything to do with it.  Josh could fix his problem
> by increasing the cost limit and/or reducing the cost delay, but if he
> did that then his database would get bloated...

Josh hasn't actually explained what his problem is, nor what if any
adjustments he made to try to ameliorate it.  In the absence of data
I refuse to rule out misconfiguration.  But, again, to the extent that
he's given us any info at all, it seemed to be a complaint about
oversaturated I/O at max load, *not* about inability to complete
vacuuming tasks as needed.  You are inventing problem details to fit
your solution.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 9:51 AM, Tom Lane  wrote:
>  You are inventing problem details to fit
> your solution.

Well, what I'm actually doing is assuming that Josh's customers have
the same problem that our customers do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 9:47 AM, Jon Nelson  wrote:
> Why not just mmap /dev/zero (MAP_SHARED but not MAP_ANONYMOUS)?  I
> seem to think that's what I did when I needed this functionality oh so
> many moons ago.

From the reading I've done on this topic, that seems to be a trick
invented on Solaris that is considered grotty and awful by everyone
else.  The thing is that you want the mapping to be shared with the
processes that inherit the mapping from you.  You do *NOT* want the
mapping to be shared with EVERYONE who has mapped that file for any
reason, which is the usual meaning of MAP_SHARED on a file.  Maybe
this happens to work correctly on some or all platforms, but I would
want to have some convincing evidence that it's more widely supported
(with the correct semantics) than MAP_ANON before relying on it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Tom Lane

Magnus Hagander  writes:
> On Thu, Jun 28, 2012 at 7:00 AM, Robert Haas  wrote:
>> A related question is - if we do this - should we enable it only on
>> ports where we've verified that it works, or should we just turn it on
>> everywhere and fix breakage if/when it's reported?  I lean toward the
>> latter.

> Depends on the amount of expected breakage, but I'd lean towards teh
> later as well.

If we don't turn it on, we won't find out whether it works.  I'd say try
it first and then back off if that proves necessary.  I'd just as soon
not see us write any fallback logic without evidence that it's needed.

FWIW, even my pet dinosaur HP-UX 10.20 box appears to support
mmap(MAP_SHARED|MAP_ANONYMOUS) --- at least the mmap man page documents
both flags.  I find it really pretty hard to believe that there are any
machines out there that haven't got this and yet might be expected to
run PG 9.3+.  We should not go into it with an expectation of failure,
anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Jon Nelson

On Thu, Jun 28, 2012 at 8:57 AM, Robert Haas  wrote:
> On Thu, Jun 28, 2012 at 9:47 AM, Jon Nelson  wrote:
>> Why not just mmap /dev/zero (MAP_SHARED but not MAP_ANONYMOUS)?  I
>> seem to think that's what I did when I needed this functionality oh so
>> many moons ago.
>
> From the reading I've done on this topic, that seems to be a trick
> invented on Solaris that is considered grotty and awful by everyone
> else.  The thing is that you want the mapping to be shared with the
> processes that inherit the mapping from you.  You do *NOT* want the
> mapping to be shared with EVERYONE who has mapped that file for any
> reason, which is the usual meaning of MAP_SHARED on a file.  Maybe
> this happens to work correctly on some or all platforms, but I would
> want to have some convincing evidence that it's more widely supported
> (with the correct semantics) than MAP_ANON before relying on it.

When I did this (I admit, it was on Linux but it was a long time ago)
only the inherited file descriptor + mmap structure mattered -
modifications were private to the process and it's children - other
apps always saw their "own" /dev/zero. A quick google suggests that -
according to qnx, sco, and some others - mmap'ing /dev/zero retains
the expected privacy. Given how /dev/zero works I'd be very surprised
if it was otherwise.

I would love to see links that suggest that /dev/zero is nasty (or, in
fact, in any way fundamentally different than mmap'ing /dev/zero) -
feel free to send them to me privately to avoid polluting the list.

-- 
Jon

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Event Triggers reduced, v1

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 9:46 AM, Robert Haas  wrote:
> [ review ]

Also:

../../../src/include/catalog/pg_event_trigger.h:34: error: expected
specifier-qualifier-list before ‘int2’

This needs to be changed to int16 as a result of commit
b8b2e3b2deeaab19715af063fc009b7c230b2336.

alter.c:73: warning: implicit declaration of function ‘RenameEventTrigger’

That file needs to include commands/event_trigger.h.

Please spell out DO_EVTTRIGGER as DO_EVENT_TRIGGER.

Another random warning:

plpy_main.c:195: warning: ‘retval’ may be used uninitialized in this function

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Tom Lane

... btw, I rather imagine that Robert has already noticed this, but OS X
(and presumably other BSDen) spells the flag "MAP_ANON" not
"MAP_ANONYMOUS".  I also find this rather interesting flag there:

 MAP_HASSEMAPHORE  Notify the kernel that the region may contain sema-
   phores and that special handling may be necessary.

By "semaphore" I suspect they mean "spinlock", so we'd better turn this
flag on where it exists.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch-2 (2-move-continuation-record-to-page-header.patch) WAL Format Changes

2012-06-28 Thread Amit Kapila

While reading patch-2 (2-move-continuation-record-to-page-header.patch) of
WAL Format
Changes(http://archives.postgresql.org/message-id/4FDA5136.6080206@enterpris
edb.com), I had few observations which are summarized below:

 


1. 
@@ -693,7 +693,6 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) 
 { 
 XLogCtlInsert *Insert = &XLogCtl->Insert; 
 XLogRecord *record; 
-XLogContRecord *contrecord; 
 XLogRecPtrRecPtr; 
 XLogRecPtrWriteRqst; 
 uint32freespace; 
@@ -1082,9 +1081,7 @@ begin:; 
 curridx = Insert->curridx; 
 /* Insert cont-record header */ 
 Insert->currpage->xlp_info |= XLP_FIRST_IS_CONTRECORD; 
-contrecord = (XLogContRecord *) Insert->currpos; 
-contrecord->xl_rem_len = write_len; 
-Insert->currpos += SizeOfXLogContRecord; 
+Insert->currpage->xlp_rem_len = write_len; 

After above code changes the comment "/* Insert cont-record header */"
should be changed. 
  
2. 
Is XLP_FIRST_IS_CONTRECORD required after putting xl_rem_len in page header;

Can't we do handling based on xl_rem_len?

 

Sorry for sending the observations in pieces rather than all-together, as I
am not sure how much I will be able to complete.

So what ever I am able to read, I am sending you my doubts or observations.

 

With Regards,

Amit Kapila.

Re: [HACKERS] [v9.3] Row-Level Security

2012-06-28 Thread Kohei KaiGai

2012/6/27 Florian Pflug :
> On Jun27, 2012, at 15:07 , Kohei KaiGai wrote:
>> Probably, PlannedStmt->invalItems allows to handle invalidation of
>> plan-cache without big code changes. I'll try to put a flag of user-id
>> to track the query plan with RLS assumed, or InvalidOid if no RLS
>> was applied in this plan.
>> I'll investigate the implementation for more details.
>>
>> Do we have any other scenario that run a query plan under different
>> user privilege rather than planner stage?
>
> Hm, what happens if a SECURITY DEFINER functions returns a refcursor?
>
> Actually, I wonder how we handle that today. If the executor is
> responsible for permission checks, that wouldn't we apply the calling
> function's privilege level in that case, at least of the cursor isn't
> fetched from in the SECURITY DEFINER function? If I find some time,
> I'll check...
>
My impression is, here is no matter even if SECURITY DEFINER function
returns refcursor.

A SECURITY DEFINER function (or Trusted Procedure on sepgsql, or
Set-UID program on Linux) provides unprivileged users a particular
"limited way" to access protected data. It means owner of the security
definer function admits it is reasonable to show the protected data
as long as unprivileged users access them via the function.

It is same reason why we admit view's access for users who have
privileges on views but unprivileged to underlying tables.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-06-28 Thread Jeff Janes

On Tue, Jun 26, 2012 at 3:58 PM, Nils Goroll  wrote:
>> It's
>> still unproven whether it'd be an improvement, but you could expect to
>> prove it one way or the other with a well-defined amount of testing.
>
> I've hacked the code to use adaptive pthread mutexes instead of spinlocks. see
> attached patch. The patch is for the git head, but it can easily be applied 
> for
> 9.1.3, which is what I did for my tests.
>
> This had disastrous effects on Solaris because it does not use anything 
> similar
> to futexes for PTHREAD_PROCESS_SHARED mutexes (only the _PRIVATE mutexes do
> without syscalls for the simple case).
>
> But I was surprised to see that it works relatively well on linux. Here's a
> glimpse of my results:
>
> hacked code 9.1.3:
...
> tps = 485.964355 (excluding connections establishing)

> original code (vanilla build on amd64) 9.1.3:
...
> tps = 510.410883 (excluding connections establishing)

It looks like the hacked code is slower than the original.  That
doesn't seem so good to me.  Am I misreading this?

Also, 20 transactions per connection is not enough of a run to make
any evaluation on.

How many cores are you testing on?

> Regarding the actual production issue, I did not manage to synthetically 
> provoke
> the saturation we are seeing in production using pgbench - I could not even 
> get
> anywhere near the production load.

What metrics/tools are you using to compare the two loads?  What is
the production load like?

Each transaction has to update one of ten pgbench_branch rows, so you
can't have more than ten transactions productively active at any given
time, even though you have 768 connections.  So you need to jack up
the pgbench scale, or switch to using -N mode.

Also, you should use -M prepared, otherwise you spend more time
parsing and planning the statements than executing them.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 10:11 AM, Tom Lane  wrote:
> ... btw, I rather imagine that Robert has already noticed this, but OS X
> (and presumably other BSDen) spells the flag "MAP_ANON" not
> "MAP_ANONYMOUS".  I also find this rather interesting flag there:
>
>     MAP_HASSEMAPHORE  Notify the kernel that the region may contain sema-
>                       phores and that special handling may be necessary.
>
> By "semaphore" I suspect they mean "spinlock", so we'd better turn this
> flag on where it exists.

Sounds fine to me.  Since no one seems opposed to the basic approach,
and everyone (I assume) will be happier to reduce the impact of
dealing with shared memory limits, I went ahead and committed a
cleaned-up version of the previous patch.  Let's see what the
build-farm thinks.

Assuming things go well, there are a number of follow-on things that
we need to do finish this up:

1. Update the documentation.  I skipped this for now, because I think
that what we write there is going to be heavily dependent on how
portable this turns out to be, which we don't know yet.  Also, it's
not exactly clear to me what the documentation should say if this does
turn out to work everywhere.  Much of section 17.4 will become
irrelevant to most users, but I doubt we'd just want to remove it; it
could still matter for people running EXEC_BACKEND or running many
postmasters on the same machine or, of course, people running on
platforms where this just doesn't work, if there are any.

2. Update the HINT messages when shared memory allocation fails.
Maybe the new most-common-failure mode there will be too many
postmasters running on the same machine?  We might need to wait for
some field reports before adjusting this.

3. Consider adjusting the logic inside initdb.  If this works
everywhere, the code for determining how to set shared_buffers should
become pretty much irrelevant.  Even if it only works some places, we
could add 64MB or 128MB or whatever to the list of values we probe, so
that people won't get quite such a sucky configuration out of the box.
 Of course there's no number here that will be good for everyone.

and of course

4. Fix any platforms that are now horribly broken.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 11:21 AM, Jeff Janes  wrote:
> Also, 20 transactions per connection is not enough of a run to make
> any evaluation on.

FWIW, I kicked off a looong benchmarking run on this a couple of days
ago on the IBM POWER7 box, testing pgbench -S, regular pgbench, and
pgbench --unlogged-tables at various client counts with and without
the patch; three half-hour test runs for each test configuration.  It
should be done tonight and I will post the results once they're in.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.3] Row-Level Security

2012-06-28 Thread Tom Lane

Kohei KaiGai  writes:
> 2012/6/27 Florian Pflug :
>> Hm, what happens if a SECURITY DEFINER functions returns a refcursor?

> My impression is, here is no matter even if SECURITY DEFINER function
> returns refcursor.

I think Florian has a point: it *should* work, but *will* it?

I believe it works today, because the executor only applies permissions
checks during query startup.  So those checks are executed while still
within the SECURITY DEFINER context, and should behave as expected.
Subsequently, the cursor portal is returned to caller and caller can
execute it to completion, no problem.

However, with RLS security-related checks will happen throughout the
execution of the portal.  They might do the wrong thing once the
SECURITY DEFINER function has been exited.

We might need to consider that a portal has a local value of
"current_user", which is kind of a pain, but probably doable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.3] Row-Level Security

2012-06-28 Thread Florian Pflug

On Jun28, 2012, at 17:29 , Tom Lane wrote:
> Kohei KaiGai  writes:
>> 2012/6/27 Florian Pflug :
>>> Hm, what happens if a SECURITY DEFINER functions returns a refcursor?
> 
>> My impression is, here is no matter even if SECURITY DEFINER function
>> returns refcursor.
> 
> I think Florian has a point: it *should* work, but *will* it?
> 
> I believe it works today, because the executor only applies permissions
> checks during query startup.  So those checks are executed while still
> within the SECURITY DEFINER context, and should behave as expected.
> Subsequently, the cursor portal is returned to caller and caller can
> execute it to completion, no problem.

Don't we (sometimes?) defer query startup to the first time FETCH is
called?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Covering Indexes

2012-06-28 Thread Jeff Janes

On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler  wrote:
> Hackers,
>
> Very interesting design document for SQLite 4:
>
>  http://www.sqlite.org/src4/doc/trunk/www/design.wiki
>
> I'm particularly intrigued by "covering indexes". For example:
>
>    CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

I don't see the virtue of this in this case.  Since the index is not
unique, why not just put the index on (a,b,c,d) and be done with it?
Is there some advantage to be had in inventing a way to store c and d
in the index without having them usable for indexing?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.3] Row-Level Security

2012-06-28 Thread Tom Lane

Florian Pflug  writes:
> On Jun28, 2012, at 17:29 , Tom Lane wrote:
>> I believe it works today, because the executor only applies permissions
>> checks during query startup.  So those checks are executed while still
>> within the SECURITY DEFINER context, and should behave as expected.
>> Subsequently, the cursor portal is returned to caller and caller can
>> execute it to completion, no problem.

> Don't we (sometimes?) defer query startup to the first time FETCH is
> called?

There are things inside individual plan node functions that may only
happen when the first row is demanded, but permissions checks are done
in ExecutorStart().

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Covering Indexes

2012-06-28 Thread Andrew Dunstan




On 06/28/2012 11:37 AM, Jeff Janes wrote:

On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler  wrote:

Hackers,

Very interesting design document for SQLite 4:

  http://www.sqlite.org/src4/doc/trunk/www/design.wiki

I'm particularly intrigued by "covering indexes". For example:

CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

I don't see the virtue of this in this case.  Since the index is not
unique, why not just put the index on (a,b,c,d) and be done with it?
Is there some advantage to be had in inventing a way to store c and d
in the index without having them usable for indexing?



Presumably the comparison function will be faster the fewer attributes 
it needs to compare.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

2012-06-28 Thread Robert Haas

On Tue, Jun 26, 2012 at 8:13 PM, Andres Freund  wrote:
> It even can be significantly higher than max_connections because
> subtransactions are only recognizable as part of their parent transaction
> uppon commit.

I've been wondering whether sub-XID assignment was going to end up on
the list of things that need to be WAL-logged to enable logical
replication.  It would be nicer to avoid that if we can, but I have a
feeling that we may not be able to.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

2012-06-28 Thread Andres Freund

On Thursday, June 28, 2012 06:01:10 PM Robert Haas wrote:
> On Tue, Jun 26, 2012 at 8:13 PM, Andres Freund  
wrote:
> > It even can be significantly higher than max_connections because
> > subtransactions are only recognizable as part of their parent transaction
> > uppon commit.
> 
> I've been wondering whether sub-XID assignment was going to end up on
> the list of things that need to be WAL-logged to enable logical
> replication.  It would be nicer to avoid that if we can, but I have a
> feeling that we may not be able to.
I don't think it needs to. We only need that information during commit and we 
have it there. If a subtxn aborts a separate abort is logged, so thats no 
problem. The 'merging' of the transactions would be slightly easier if we had 
the knowledge from the get go but that would add complications again in the 
case of rollbacks.

What do you think we need it?

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Covering Indexes

2012-06-28 Thread Tom Lane

Andrew Dunstan  writes:
> On 06/28/2012 11:37 AM, Jeff Janes wrote:
>> On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler  
>> wrote:
>>> I'm particularly intrigued by "covering indexes". For example:
>>> CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

>> I don't see the virtue of this in this case.  Since the index is not
>> unique, why not just put the index on (a,b,c,d) and be done with it?

> Presumably the comparison function will be faster the fewer attributes 
> it needs to compare.

Well, no, because queries will only be comparing the columns used in
the query.  Insertions would likely actually be faster with the extra
columns considered significant, since we've known for a long time that
btree doesn't especially like large numbers of identical index entries.

When this came up a couple weeks ago, the argument that was made for it
was that you could attach non-significant columns to an index that *is*
unique.  That might or might not be a wide enough use-case to justify
adding such a horrid kludge.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Covering Indexes

2012-06-28 Thread Alvaro Herrera


Excerpts from Tom Lane's message of jue jun 28 12:07:58 -0400 2012:

> When this came up a couple weeks ago, the argument that was made for it
> was that you could attach non-significant columns to an index that *is*
> unique.  That might or might not be a wide enough use-case to justify
> adding such a horrid kludge.

The other question is whether such an index would prevent an update from
being HOT when the non-indexed values are touched.  That could be a
significant difference.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Thom Brown

On 28 June 2012 16:26, Robert Haas  wrote:
> On Thu, Jun 28, 2012 at 10:11 AM, Tom Lane  wrote:
>> ... btw, I rather imagine that Robert has already noticed this, but OS X
>> (and presumably other BSDen) spells the flag "MAP_ANON" not
>> "MAP_ANONYMOUS".  I also find this rather interesting flag there:
>>
>>     MAP_HASSEMAPHORE  Notify the kernel that the region may contain sema-
>>                       phores and that special handling may be necessary.
>>
>> By "semaphore" I suspect they mean "spinlock", so we'd better turn this
>> flag on where it exists.
>
> Sounds fine to me.  Since no one seems opposed to the basic approach,
> and everyone (I assume) will be happier to reduce the impact of
> dealing with shared memory limits, I went ahead and committed a
> cleaned-up version of the previous patch.  Let's see what the
> build-farm thinks.
>
> Assuming things go well, there are a number of follow-on things that
> we need to do finish this up:
>
> 1. Update the documentation.  I skipped this for now, because I think
> that what we write there is going to be heavily dependent on how
> portable this turns out to be, which we don't know yet.  Also, it's
> not exactly clear to me what the documentation should say if this does
> turn out to work everywhere.  Much of section 17.4 will become
> irrelevant to most users, but I doubt we'd just want to remove it; it
> could still matter for people running EXEC_BACKEND or running many
> postmasters on the same machine or, of course, people running on
> platforms where this just doesn't work, if there are any.
>
> 2. Update the HINT messages when shared memory allocation fails.
> Maybe the new most-common-failure mode there will be too many
> postmasters running on the same machine?  We might need to wait for
> some field reports before adjusting this.
>
> 3. Consider adjusting the logic inside initdb.  If this works
> everywhere, the code for determining how to set shared_buffers should
> become pretty much irrelevant.  Even if it only works some places, we
> could add 64MB or 128MB or whatever to the list of values we probe, so
> that people won't get quite such a sucky configuration out of the box.
>  Of course there's no number here that will be good for everyone.
>
> and of course
>
> 4. Fix any platforms that are now horribly broken.

On 64-bit Linux, if I allocate more shared buffers than the system is
capable of reserving, it doesn't start.  This is expected, but there's
no error logged anywhere (actually, nothing logged at all), and the
postmaster.pid file is left behind after this failure.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] embedded list v2

2012-06-28 Thread Robert Haas

On Tue, Jun 26, 2012 at 7:26 PM, Andres Freund  wrote:
> Attached are three patches:
> 1. embedded list implementation
> 2. make the list implementation usable without USE_INLINE
> 3. convert all callers to ilist.h away from dllist.h

This code doesn't follow PostgreSQL coding style guidelines; in a
function definition, the name must start on its own line.  Also, stuff
like this is grossly unlike what's done elsewhere; use the same
formatting as e.g. foreach:

+#define ilist_d_reverse_foreach(name, ptr) for(name =
(ptr)->head.prev;\
+   name != &(ptr)->head;\
+   name = name->prev)

And this is definitely NOT going to survive pgindent:

+for(cur = head->head.prev;
+cur != &head->head;
+cur = cur->prev){
+   if(!(cur) ||
+  !(cur->next) ||
+  !(cur->prev) ||
+  !(cur->prev->next == cur) ||
+  !(cur->next->prev == cur))
+   {
+   elog(ERROR, "double linked list is corrupted");
+   }
+}

And this is another meme we don't use elsewhere; add an explicit NULL test:

+   while ((cur = last->next))

And then there's stuff like this:

+   if(!cur){
+   elog(ERROR, "single linked list is corrupted");
+   }

Aside from the formatting issues, I think it would be sensible to
preserve the terminology of talking about the "head" and "tail" of a
list that we use elsewhere, instead of calling them the "front" and
"back" as you've done here.  I would suggest that instead of add_after
and add_before you use the names insert_after and insert_before, which
I think is more clear; also instead of remove, I think you should say
delete, for consistency with pg_list.h.

A number of these inlined functions could be rewritten as macros -
e.g. ilist_d_has_next, ilist_d_has_prev, ilist_d_is_empty.  Since some
things are written as macros anyway maybe it's good to do all the
one-liners that way, although I guess it doesn't matter that much.  I
also agree with your XXX comment that ilist_s_remove should probably
be out of line.

Apart from the above, mostly fairly superficial concerns I think this
makes sense.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Covering Indexes

2012-06-28 Thread Aidan Van Dyk

On Thu, Jun 28, 2012 at 12:12 PM, Alvaro Herrera
 wrote:

> The other question is whether such an index would prevent an update from
> being HOT when the non-indexed values are touched.  That could be a
> significant difference.

I don't see Index-Only-Scans being something that will be used in
"high churn" tables.

So as long as the value of these "covering/included" fields is tied to
index-only scans, maybe it isn't a problem?

Of course, we have have a hard time convincing people that the  "index
only" scans they want can't be "index only" because heap pages aren't
"all visible"...

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Covering Indexes

2012-06-28 Thread Tom Lane

Alvaro Herrera  writes:
> The other question is whether such an index would prevent an update from
> being HOT when the non-indexed values are touched.

Surely it would *have* to, whether the columns are significant or not
for uniqueness purposes.  Else an index-only scan gives the wrong value
after the update.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_signal_backend() asymmetry

2012-06-28 Thread Josh Kupershmidt

On Thu, Jun 28, 2012 at 6:48 AM, Noah Misch  wrote:
> On Thu, Jun 28, 2012 at 01:36:49AM -0700, Daniel Farina wrote:
>> On Wed, Jun 27, 2012 at 5:38 PM, Josh Kupershmidt  wrote:
>> > I have one nitpick related to the recent changes for
>> > pg_cancel_backend() and pg_terminate_backend(). If you use these
>> > functions as an unprivileged user, and try to signal a nonexistent
>> > PID, you get:
>>
>> I think the goal there is to avoid leakage of the knowledge or
>> non-knowledge of a given PID existing once it is deemed out of
>> Postgres' control.  Although I don't have a specific attack vector in
>> mind for when one knows a PID exists a-priori, it does seem like an
>> unnecessary admission on the behalf of other programs.
>
> I think it was just an oversight.  I agree that these functions have no
> business helping users probe for live non-PostgreSQL PIDs on the server, but
> they don't do so and Josh's patch won't change that.  I recommend committing
> the patch.  Users will be able to probe for live PostgreSQL PIDs, but
> pg_stat_activity already provides those.

Yeah, I figured that since pg_stat_activity already shows users PIDs,
there should be no need to have pg_signal_backend() lie about whether
a PID was a valid backend or not. And if for some reason we did want
to lie, we should give an accurate message like "not valid backend, or
must be superuser or have the same role...".

I noticed that I neglected to update the comment which was in the
non-superuser case: updated version attached.

On Thu, Jun 28, 2012 at 5:22 AM, Magnus Hagander  wrote:
> Well, there are two sides to it - one is the message, the other is if
> it should be ERROR or WARNING. My reading is that Josh is suggesting
> we make them WARNING in both cases, right?

Maybe I should have said I had "a few" nitpicks instead of just "one"
:-) A more detailed summary of the little issues I'm aiming to fix:

1a.) SIGNAL_BACKEND_NOPERMISSION being used for both invalid PID and
role mismatch, causing the "must be superuser or have ..." message to
be printed in both cases for non-superusers

1b.) Since SIGNAL_BACKEND_NOPERMISSION is used for both duties, you'll
get an ERROR instead of just a WARNING during plausibly-normal usage.
For example, if you're running
  SELECT pg_cancel_backend(pid)
  FROM pg_stat_activity
  WHERE usename = CURRENT_USER AND
  pid != pg_backend_pid();

you might be peeved if it ERROR'ed out with the bogus message claiming
the process was owned by someone else, when the backend has just
exited on its own. So even if we thought it was worth hiding to
non-superusers whether the PID is invalid or belongs to someone else,
I think the message should just be at WARNING level.

2a.) Existing code uses two calls to BackendPidGetProc() for
non-superusers in the error-free case.

2b.) I think the comment "the check for whether it's a backend process
is below" is a little misleading, since IsBackendPid() is just
checking whether the return of BackendPidGetProc() is NULL, which has
already been done.

3.) grammar fix for comment ("on it's own" -> "on its own")

Josh

pg_signal_backend_asymmetry.v2.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Jeff Janes

On Thu, Jun 28, 2012 at 8:26 AM, Robert Haas  wrote:

> 3. Consider adjusting the logic inside initdb.  If this works
> everywhere, the code for determining how to set shared_buffers should
> become pretty much irrelevant.  Even if it only works some places, we
> could add 64MB or 128MB or whatever to the list of values we probe, so
> that people won't get quite such a sucky configuration out of the box.
>  Of course there's no number here that will be good for everyone.

This seems independent of the type of shared memory used and the
limits on it.  If it tried and 64MB or 128MB and discovered that it
couldn't obtain that much shared memory, it automatically climbs down
to smaller values until it finds one that works.  I think the
impediment to adopting larger defaults is not what happens if it can't
get that much shared memory, but rather what happens if the machine
doesn't have that much physical memory.  The test server will still
start (and so there will be no climb-down), leaving a default which is
valid but just has horrid performance.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Covering Indexes

2012-06-28 Thread Jeff Janes

On Thu, Jun 28, 2012 at 9:12 AM, Alvaro Herrera
 wrote:
>
> Excerpts from Tom Lane's message of jue jun 28 12:07:58 -0400 2012:
>
>> When this came up a couple weeks ago, the argument that was made for it
>> was that you could attach non-significant columns to an index that *is*
>> unique.  That might or might not be a wide enough use-case to justify
>> adding such a horrid kludge.
>
> The other question is whether such an index would prevent an update from
> being HOT when the non-indexed values are touched.

That seems like an easy question to answer.  How could it not disable
HOT and still work correctly?

> That could be a
> significant difference.

True, adding the covering column would not always be a win.  But
surely it more likely to be a win when it can be done without adding
yet another index that also needs to be maintained.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 12:13 PM, Thom Brown  wrote:
> On 64-bit Linux, if I allocate more shared buffers than the system is
> capable of reserving, it doesn't start.  This is expected, but there's
> no error logged anywhere (actually, nothing logged at all), and the
> postmaster.pid file is left behind after this failure.

Fixed.

However, I discovered something unpleasant.  With the new code, on
MacOS X, if you set shared_buffers to say 3200GB, the server happily
starts up.  Or at least the shared memory allocation goes through just
fine.  The postmaster then sits there apparently forever without
emitting any log messages, which I eventually discovered was because
it's busy initializing a billion or so spinlocks.

I'm pretty sure that this machine does not have >3TB of virtual
memory, even counting swap.  So that means that MacOS X has absolutely
no common sense whatsoever as far as anonymous shared memory
allocations go.  Not sure exactly what to do about that.  Linux is
more sensible, at least on the system I tested, and fails cleanly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Magnus Hagander

On Thu, Jun 28, 2012 at 7:15 PM, Robert Haas  wrote:
> On Thu, Jun 28, 2012 at 12:13 PM, Thom Brown  wrote:
>> On 64-bit Linux, if I allocate more shared buffers than the system is
>> capable of reserving, it doesn't start.  This is expected, but there's
>> no error logged anywhere (actually, nothing logged at all), and the
>> postmaster.pid file is left behind after this failure.
>
> Fixed.
>
> However, I discovered something unpleasant.  With the new code, on
> MacOS X, if you set shared_buffers to say 3200GB, the server happily
> starts up.  Or at least the shared memory allocation goes through just
> fine.  The postmaster then sits there apparently forever without
> emitting any log messages, which I eventually discovered was because
> it's busy initializing a billion or so spinlocks.
>
> I'm pretty sure that this machine does not have >3TB of virtual
> memory, even counting swap.  So that means that MacOS X has absolutely
> no common sense whatsoever as far as anonymous shared memory
> allocations go.  Not sure exactly what to do about that.  Linux is
> more sensible, at least on the system I tested, and fails cleanly.

What happens if you mlock() it into memory - does that fail quickly?
Is that not something we might want to do *anyway*?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Andres Freund

On Thursday, June 28, 2012 07:19:46 PM Magnus Hagander wrote:
> On Thu, Jun 28, 2012 at 7:15 PM, Robert Haas  wrote:
> > On Thu, Jun 28, 2012 at 12:13 PM, Thom Brown  wrote:
> >> On 64-bit Linux, if I allocate more shared buffers than the system is
> >> capable of reserving, it doesn't start.  This is expected, but there's
> >> no error logged anywhere (actually, nothing logged at all), and the
> >> postmaster.pid file is left behind after this failure.
> > 
> > Fixed.
> > 
> > However, I discovered something unpleasant.  With the new code, on
> > MacOS X, if you set shared_buffers to say 3200GB, the server happily
> > starts up.  Or at least the shared memory allocation goes through just
> > fine.  The postmaster then sits there apparently forever without
> > emitting any log messages, which I eventually discovered was because
> > it's busy initializing a billion or so spinlocks.
> > 
> > I'm pretty sure that this machine does not have >3TB of virtual
> > memory, even counting swap.  So that means that MacOS X has absolutely
> > no common sense whatsoever as far as anonymous shared memory
> > allocations go.  Not sure exactly what to do about that.  Linux is
> > more sensible, at least on the system I tested, and fails cleanly.
> 
> What happens if you mlock() it into memory - does that fail quickly?
> Is that not something we might want to do *anyway*?
You normally can only mlock() mminor amounts of memory without changing 
settings. Requiring to change that setting (aside that mlocking would be a bad 
idea imo) would run contrary to the point of the patch, wouldn't it? ;)

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Magnus Hagander

On Thu, Jun 28, 2012 at 7:27 PM, Andres Freund  wrote:
> On Thursday, June 28, 2012 07:19:46 PM Magnus Hagander wrote:
>> On Thu, Jun 28, 2012 at 7:15 PM, Robert Haas  wrote:
>> > On Thu, Jun 28, 2012 at 12:13 PM, Thom Brown  wrote:
>> >> On 64-bit Linux, if I allocate more shared buffers than the system is
>> >> capable of reserving, it doesn't start.  This is expected, but there's
>> >> no error logged anywhere (actually, nothing logged at all), and the
>> >> postmaster.pid file is left behind after this failure.
>> >
>> > Fixed.
>> >
>> > However, I discovered something unpleasant.  With the new code, on
>> > MacOS X, if you set shared_buffers to say 3200GB, the server happily
>> > starts up.  Or at least the shared memory allocation goes through just
>> > fine.  The postmaster then sits there apparently forever without
>> > emitting any log messages, which I eventually discovered was because
>> > it's busy initializing a billion or so spinlocks.
>> >
>> > I'm pretty sure that this machine does not have >3TB of virtual
>> > memory, even counting swap.  So that means that MacOS X has absolutely
>> > no common sense whatsoever as far as anonymous shared memory
>> > allocations go.  Not sure exactly what to do about that.  Linux is
>> > more sensible, at least on the system I tested, and fails cleanly.
>>
>> What happens if you mlock() it into memory - does that fail quickly?
>> Is that not something we might want to do *anyway*?
> You normally can only mlock() mminor amounts of memory without changing
> settings. Requiring to change that setting (aside that mlocking would be a bad
> idea imo) would run contrary to the point of the patch, wouldn't it? ;)

It would. I wasn't aware of that limitation :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Patch-2 (2-move-continuation-record-to-page-header.patch) WAL Format Changes

2012-06-28 Thread Heikki Linnakangas


On 28.06.2012 17:40, Amit Kapila wrote:

1.
@@ -693,7 +693,6 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata)
  {
  XLogCtlInsert *Insert =&XLogCtl->Insert;
  XLogRecord *record;
-XLogContRecord *contrecord;
  XLogRecPtrRecPtr;
  XLogRecPtrWriteRqst;
  uint32freespace;
@@ -1082,9 +1081,7 @@ begin:;
  curridx = Insert->curridx;
  /* Insert cont-record header */
  Insert->currpage->xlp_info |= XLP_FIRST_IS_CONTRECORD;
-contrecord = (XLogContRecord *) Insert->currpos;
-contrecord->xl_rem_len = write_len;
-Insert->currpos += SizeOfXLogContRecord;
+Insert->currpage->xlp_rem_len = write_len;

After above code changes the comment "/* Insert cont-record header */"
should be changed.


Thanks, fixed.


2.
Is XLP_FIRST_IS_CONTRECORD required after putting xl_rem_len in page header;

Can't we do handling based on xl_rem_len?


Hmm, yeah, it's redundant now, we could use xl_rem_len > 0 to indicate a 
continued record. I thought I'd keep the flag to avoid unnecessary 
changes, to make life a bit easier for 3rd party tools that read the 
WAL, but I don't know if it really makes any difference. There is no 
shortage of xlog page header flag bits, so there's no hurry to get rid 
of it.



Sorry for sending the observations in pieces rather than all-together, as I
am not sure how much I will be able to complete.

So what ever I am able to read, I am sending you my doubts or observations.


Thanks for the review, much appreciated!

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Tom Lane

Magnus Hagander  writes:
> On Thu, Jun 28, 2012 at 7:27 PM, Andres Freund  wrote:
>> On Thursday, June 28, 2012 07:19:46 PM Magnus Hagander wrote:
>>> What happens if you mlock() it into memory - does that fail quickly?
>>> Is that not something we might want to do *anyway*?

>> You normally can only mlock() mminor amounts of memory without changing
>> settings. Requiring to change that setting (aside that mlocking would be a 
>> bad
>> idea imo) would run contrary to the point of the patch, wouldn't it? ;)

> It would. I wasn't aware of that limitation :)

The OSX man page says that mlock should give EAGAIN for a permissions
failure (ie, exceeding the rlimit) but

 [ENOMEM]   Some portion of the indicated address range is not
allocated.  There was an error faulting/mapping a
page.

It might be helpful to try mlock (if available, which it isn't
everywhere) and complain about ENOMEM but not other errors.  If course,
if the kernel checks rlimit first, we won't learn anything ...

I think it *would* be a good idea to mlock if we could.  Setting shmem
large enough that it swaps has always been horrible for performance,
and in sysv-land there's no way to prevent that.  But we can't error
out on permissions failure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Andres Freund

On Thursday, June 28, 2012 07:43:16 PM Tom Lane wrote:
> Magnus Hagander  writes:
> > On Thu, Jun 28, 2012 at 7:27 PM, Andres Freund  
wrote:
> >> On Thursday, June 28, 2012 07:19:46 PM Magnus Hagander wrote:
> >>> What happens if you mlock() it into memory - does that fail quickly?
> >>> Is that not something we might want to do *anyway*?
> >> 
> >> You normally can only mlock() mminor amounts of memory without changing
> >> settings. Requiring to change that setting (aside that mlocking would be
> >> a bad idea imo) would run contrary to the point of the patch, wouldn't
> >> it? ;)
> > 
> > It would. I wasn't aware of that limitation :)
> 
> The OSX man page says that mlock should give EAGAIN for a permissions
> failure (ie, exceeding the rlimit) but
> 
>  [ENOMEM]   Some portion of the indicated address range is not
> allocated.  There was an error faulting/mapping a
> page.
> 
> It might be helpful to try mlock (if available, which it isn't
> everywhere) and complain about ENOMEM but not other errors.  If course,
> if the kernel checks rlimit first, we won't learn anything ...
> 
> I think it *would* be a good idea to mlock if we could.  Setting shmem
> large enough that it swaps has always been horrible for performance,
> and in sysv-land there's no way to prevent that.  But we can't error
> out on permissions failure.
Its also a very good method to get into hard to diagnose OOM situations 
though. Unless the machine is setup very careful and only runs postgres I 
don't think its acceptable to do that.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Heikki Linnakangas


On 28.06.2012 15:18, Robert Haas wrote:

On Thu, Jun 28, 2012 at 4:21 AM, Simon Riggs  wrote:

2. Should we rename the GUCs, since this patch will cause them to
control WAL flush in general, as opposed to commit specifically?
Peter Geoghegan and Simon were arguing that we should retitle it to
group_commit_delay rather than just commit_delay, but that doesn't
seem to be much of an improvement in describing what the new behavior
will actually be, and I am doubtful that it is worth creating a naming
incompatibility with previous releases for a cosmetic change.  I
suggested wal_flush_delay, but there's no consensus on that.
Opinions?


Again, leave the naming of that for later. The idea of a rename came
from yourself, IIRC.


Deciding on a name is not such a hard thing that leaving it till later
solves any problem.  Let's just make a decision and be done with it.


FWIW, I think commit_delay is just fine. In practice, it's mostly 
commits that are affected, anyway. If we try to be more exact, I think 
it's just going to be more confusing to users.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Tom Lane

Andres Freund  writes:
> On Thursday, June 28, 2012 07:43:16 PM Tom Lane wrote:
>> I think it *would* be a good idea to mlock if we could.  Setting shmem
>> large enough that it swaps has always been horrible for performance,
>> and in sysv-land there's no way to prevent that.  But we can't error
>> out on permissions failure.

> Its also a very good method to get into hard to diagnose OOM situations 
> though. Unless the machine is setup very careful and only runs postgres I 
> don't think its acceptable to do that.

Well, the permissions angle is actually a good thing here.  There is
pretty much no risk of the mlock succeeding on a box that hasn't been
specially configured --- and, in most cases, I think you'd need root
cooperation to raise postgres' RLIMIT_MEMLOCK.  So I think we could try
to mlock without having any effect for 99% of users.  The 1% who are
smart enough to raise the rlimit to something suitable would get better,
or at least more predictable, performance.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Andres Freund

On Thursday, June 28, 2012 08:00:06 PM Tom Lane wrote:
> Andres Freund  writes:
> > On Thursday, June 28, 2012 07:43:16 PM Tom Lane wrote:
> >> I think it *would* be a good idea to mlock if we could.  Setting shmem
> >> large enough that it swaps has always been horrible for performance,
> >> and in sysv-land there's no way to prevent that.  But we can't error
> >> out on permissions failure.
> > 
> > Its also a very good method to get into hard to diagnose OOM situations
> > though. Unless the machine is setup very careful and only runs postgres I
> > don't think its acceptable to do that.
> 
> Well, the permissions angle is actually a good thing here.  There is
> pretty much no risk of the mlock succeeding on a box that hasn't been
> specially configured --- and, in most cases, I think you'd need root
> cooperation to raise postgres' RLIMIT_MEMLOCK.  So I think we could try
> to mlock without having any effect for 99% of users.  The 1% who are
> smart enough to raise the rlimit to something suitable would get better,
> or at least more predictable, performance.
The heightened limit might just as well target at another application and be 
setup a bit to widely. I agree that it is useful, but I think it requires its 
own setting, defaulting to off. Especially as there are no experiences with 
running a larger pg instance that way.

Greetings,

Andres, for once the conservative one, Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Tom Lane

Andres Freund  writes:
> On Thursday, June 28, 2012 08:00:06 PM Tom Lane wrote:
>> Well, the permissions angle is actually a good thing here.  There is
>> pretty much no risk of the mlock succeeding on a box that hasn't been
>> specially configured --- and, in most cases, I think you'd need root
>> cooperation to raise postgres' RLIMIT_MEMLOCK.  So I think we could try
>> to mlock without having any effect for 99% of users.  The 1% who are
>> smart enough to raise the rlimit to something suitable would get better,
>> or at least more predictable, performance.

> The heightened limit might just as well target at another application and be 
> setup a bit to widely. I agree that it is useful, but I think it requires its 
> own setting, defaulting to off. Especially as there are no experiences with 
> running a larger pg instance that way.

[ shrug... ]  I think you're inventing things to be afraid of, and
ignoring a very real problem that mlock could fix.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Peter Geoghegan

On 28 June 2012 18:57, Heikki Linnakangas
 wrote:
> FWIW, I think commit_delay is just fine. In practice, it's mostly commits
> that are affected, anyway. If we try to be more exact, I think it's just
> going to be more confusing to users.

You think it will confuse users less if we start telling them to use
something that we have a very long history of telling them not to use?
The docs say "it is rare that adding delay by increasing this
parameter will actually improve performance". The book PostgreSQL
high-performance says of commit_delay (and commit_siblings) "These are
not effective parameters to tune in most cases. It is extremely
difficult to show any speed-up by adjusting them, and quite easy to
slow every transaction down by tweaking them". Who would be brave
enough to use that in production? Unless you still think these
statements are true, and I don't see why you would, it seems like a
really bad judgement to not change the name.

Is anyone aware of a non-zero commit_delay in the wild today? I
personally am not.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] embedded list v2

2012-06-28 Thread Andres Freund

On Thursday, June 28, 2012 06:23:05 PM Robert Haas wrote:
> On Tue, Jun 26, 2012 at 7:26 PM, Andres Freund  
wrote:
> > Attached are three patches:
> > 1. embedded list implementation
> > 2. make the list implementation usable without USE_INLINE
> > 3. convert all callers to ilist.h away from dllist.h
> 
> This code doesn't follow PostgreSQL coding style guidelines; in a
> function definition, the name must start on its own line.
Hrmpf. Yes.

> Also, stuff like this is grossly unlike what's done elsewhere; use the same
> formatting as e.g. foreach:
> +#define ilist_d_reverse_foreach(name, ptr) for(name =
> (ptr)->head.prev;\
> +   name != &(ptr)->head;\
> +   name = name->prev)
Its not only unlike the rest its also ugly... Fixed.

> And this is definitely NOT going to survive pgindent:
> 
> +for(cur = head->head.prev;
> +cur != &head->head;
> +cur = cur->prev){
> +   if(!(cur) ||
> +  !(cur->next) ||
> +  !(cur->prev) ||
> +  !(cur->prev->next == cur) ||
> +  !(cur->next->prev == cur))
> +   {
> +   elog(ERROR, "double linked list is corrupted");
> +   }
> +}
I changed the for() contents to one line. I don't think I can write anything 
that won't be changed by pgindent for the if()s contents.

> And this is another meme we don't use elsewhere; add an explicit NULL test:
> 
> +   while ((cur = last->next))
Fixed.

> And then there's stuff like this:
> 
> +   if(!cur){
> +   elog(ERROR, "single linked list is corrupted");
> +   }
Fixed the places that I found.

> Aside from the formatting issues, I think it would be sensible to
> preserve the terminology of talking about the "head" and "tail" of a
> list that we use elsewhere, instead of calling them the "front" and
> "back" as you've done here.  I would suggest that instead of add_after
> and add_before you use the names insert_after and insert_before, which
> I think is more clear; also instead of remove, I think you should say
> delete, for consistency with pg_list.h.
Heh. Ive been poisoned from using c++ too much (thats partially stl names). 

Changed.

> A number of these inlined functions could be rewritten as macros -
> e.g. ilist_d_has_next, ilist_d_has_prev, ilist_d_is_empty.  Since some
> things are written as macros anyway maybe it's good to do all the
> one-liners that way, although I guess it doesn't matter that much.
I find inline functions preferrable because of multiple evaluation stuff. The 
macros are macros because of the dynamic return type and other similar issues 
which cannot be done in plain C.

> I also agree with your XXX comment that ilist_s_remove should probably
> be out of line.
Done.

Looks good now?

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From c3c80925e780489351c4de210364e55d223f02a8 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sun, 6 May 2012 00:26:35 +0200
Subject: [PATCH 1/2] Add embedded list interface

Adds a single and a double linked list which can easily embedded into other
datastructures and can be used without additional memory allocations.
---
 src/backend/lib/Makefile |2 +-
 src/backend/lib/ilist.c  |  123 
 src/include/lib/ilist.h  |  468 ++
 3 files changed, 592 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/lib/ilist.c
 create mode 100644 src/include/lib/ilist.h

diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index 2e1061e..c94297a 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/lib
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = dllist.o stringinfo.o
+OBJS = dllist.o stringinfo.o ilist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/ilist.c b/src/backend/lib/ilist.c
new file mode 100644
index 000..f78ac51
--- /dev/null
+++ b/src/backend/lib/ilist.c
@@ -0,0 +1,123 @@
+/*-
+ *
+ * ilist.c
+ *	  support for integrated/inline double and single linked lists
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/lib/ilist.c
+ *
+ * NOTES
+ *
+ *	  This function only contains testing code for inline single/double linked
+ *	  lists.
+ *
+ *-
+ */
+
+#include "postgres.h"
+
+/*
+ * If inlines aren't available include the function as defined in the header,
+ * but without 'static inline' defined. That way we do not have to duplicate
+ * their functionality.
+ */
+#ifndef USE_INLINE
+#

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Kevin Grittner

Peter Geoghegan  wrote:
 
> Is anyone aware of a non-zero commit_delay in the wild today? I
> personally am not.
 
http://archives.postgresql.org/pgsql-performance/2011-11/msg00083.php
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Peter Geoghegan

On 28 June 2012 19:25, Kevin Grittner  wrote:
> Peter Geoghegan  wrote:
>
>> Is anyone aware of a non-zero commit_delay in the wild today? I
>> personally am not.
>
> http://archives.postgresql.org/pgsql-performance/2011-11/msg00083.php

In that thread, Robert goes on to say to the OP that has set commit_delay:

>On Fri, Nov 4, 2011 at 2:45 PM, Claudio Freire  wrote:
>> I don't think 1 second can be such a big difference for the bgwriter,
>> but I might be wrong.
>
> Well, the default value is 200 ms.   And I've never before heard of
> anyone tuning it up, except maybe to save on power consumption on a
> system with very low utilization.  Nearly always you want to reduce
> it.
>
>> The wal_writer makes me doubt, though. If logged activity was higher
>> than 8MB/s, then that setting would block it all.
>> I guess I really should lower it.
>
> Here again, you've set it to ten times the default value.  That
> doesn't seem like a good idea.  I would start with the default and
> tune down.

So, let me rephrase my question: Is anyone aware of a non-zero
commit_delay in the wild today with sensible reasoning behind it?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 1:43 PM, Tom Lane  wrote:
> Magnus Hagander  writes:
>> On Thu, Jun 28, 2012 at 7:27 PM, Andres Freund  
>> wrote:
>>> On Thursday, June 28, 2012 07:19:46 PM Magnus Hagander wrote:
 What happens if you mlock() it into memory - does that fail quickly?
 Is that not something we might want to do *anyway*?
>
>>> You normally can only mlock() mminor amounts of memory without changing
>>> settings. Requiring to change that setting (aside that mlocking would be a 
>>> bad
>>> idea imo) would run contrary to the point of the patch, wouldn't it? ;)
>
>> It would. I wasn't aware of that limitation :)
>
> The OSX man page says that mlock should give EAGAIN for a permissions
> failure (ie, exceeding the rlimit) but
>
>     [ENOMEM]           Some portion of the indicated address range is not
>                        allocated.  There was an error faulting/mapping a
>                        page.
>
> It might be helpful to try mlock (if available, which it isn't
> everywhere) and complain about ENOMEM but not other errors.  If course,
> if the kernel checks rlimit first, we won't learn anything ...

I tried this.  At least on my fairly vanilla MacOS X desktop, an mlock
for a larger amount of memory than was conveniently on hand (4GB, on a
4GB box) neither succeeded nor failed in a timely fashion but instead
progressively hung the machine, apparently trying to progressively
push every available page out to swap.  After 5 minutes or so I could
no longer move the mouse.  After about 20 minutes I gave up and hit
the reset button.  So there's apparently no value to this as a
diagnostic tool, at least on this platform.

> I think it *would* be a good idea to mlock if we could.  Setting shmem
> large enough that it swaps has always been horrible for performance,
> and in sysv-land there's no way to prevent that.  But we can't error
> out on permissions failure.

I wouldn't mind having an option, but I think there'd have to be a way
to turn it off for people trying to cram as many lightly-used VMs as
possible onto a single server.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Tom Lane

Robert Haas  writes:
> I tried this.  At least on my fairly vanilla MacOS X desktop, an mlock
> for a larger amount of memory than was conveniently on hand (4GB, on a
> 4GB box) neither succeeded nor failed in a timely fashion but instead
> progressively hung the machine, apparently trying to progressively
> push every available page out to swap.  After 5 minutes or so I could
> no longer move the mouse.  After about 20 minutes I gave up and hit
> the reset button.  So there's apparently no value to this as a
> diagnostic tool, at least on this platform.

Fun.  I wonder if other BSDen are as brain-dead as OSX on this point.

It'd probably at least be worth filing a bug report with Apple about it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 2:18 PM, Peter Geoghegan  wrote:
> You think it will confuse users less if we start telling them to use
> something that we have a very long history of telling them not to use?

I don't buy this line of reasoning at all.  If we're going to rename
the GUC, it should be for accuracy, not PR value.  If we start
renaming something every time we improve it, we're going to go nuts.
We improved lots of things in 9.2; they didn't all get renamed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Peter Geoghegan

On 28 June 2012 19:55, Robert Haas  wrote:
> On Thu, Jun 28, 2012 at 2:18 PM, Peter Geoghegan  
> wrote:
>> You think it will confuse users less if we start telling them to use
>> something that we have a very long history of telling them not to use?
>
> I don't buy this line of reasoning at all.  If we're going to rename
> the GUC, it should be for accuracy, not PR value.  If we start
> renaming something every time we improve it, we're going to go nuts.
> We improved lots of things in 9.2; they didn't all get renamed.

That is a false equivalence, and you know it.

Who said anything about PR value? I'm concerned with not confusing users.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Tom Lane

Robert Haas  writes:
> On Thu, Jun 28, 2012 at 2:18 PM, Peter Geoghegan  
> wrote:
>> You think it will confuse users less if we start telling them to use
>> something that we have a very long history of telling them not to use?

> I don't buy this line of reasoning at all.  If we're going to rename
> the GUC, it should be for accuracy, not PR value.  If we start
> renaming something every time we improve it, we're going to go nuts.
> We improved lots of things in 9.2; they didn't all get renamed.

See VACUUM FULL for a recent counterexample --- we basically jacked it
up and drove a new implementation underneath, but we didn't change the
name, despite the fact that we were obsoleting a whole lot more folk
knowledge than exists around commit_delay.

Of course, there were application-compatibility reasons not to rename
that command, which wouldn't apply so much to commit_delay.  But still,
we have precedent for expecting that we can fix external documentation
rather than trying to code around whatever it says.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-28 Thread Josh Berkus

Robert, Tom, Stephen,

So, first, a description of the specific problem I've encountered at two
sites.  I'm working on another email suggesting workarounds and
solutions, but that's going to take a bit longer.

Observation
---

This problem occured on two database systems which shared the following
characteristics:

1) They were running with default autovacuum & vacuum settings, except
that one database had 5 workers instead of 3.

2) They have large partitioned tables, in which the partitions are
time-based and do not receive UPDATES after a certain date.  Each
partition was larger than RAM.

3) The databases are old enough, and busy enough, to have been through
XID wraparound at least a couple of times.

Users reported that the database system became unresponsive, which was
surprising since both of these DBs had been placed on hardware which was
engineered for at least 100% growth over the current database size.  On
investigation, we discovered the following things:

a) Each database had autovacuum_max_workers (one DB 5, one DB 3) doing
anti-wraparound vacuum on several partitions simultaneously.

b) The I/O created by the anti-wraparound vacuum was tying up the system.

c) terminating any individual autovacuum process didn't help, as it
simply caused autovac to start on a different partition.

So, first question was: why was autovacuum wanting to anti-wrapround
vacuum dozens of tables at the same time?  A quick check showed that all
of these partitions had nearly identical XID ages (as in less than
100,000 transactions apart), which all had exceeded
autovacuum_max_freeze_age.  How did this happen?  I'm still not sure.

One thought is: this is an artifact of the *previous* wraparound vacuums
on each database.   On cold partitions with old dead rows which have
been through wraparound vacuum several times, this tends to result in
the cold partitions converging towards having the same relfrozenxid over
time; I'm still working on the math to prove this.  Alternately, it's
possible that a schema change to the partitioned tables gave them all
the same effective relfrozenxid at some point in the past; both
databases are still in development.

So there are two parts to this problem, each of which needs a different
solution:

1. Databases can inadvertently get to the state where many tables need
wraparound vacuuming at exactly the same time, especially if they have
many "cold" data partition tables.

2. When we do hit wraparound thresholds for multiple tables, autovacuum
has no hesitation about doing autovacuum_max_workers worth of wraparound
vacuum simultaneously, even when that exceeds the I/O capactity of the
system.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Peter Geoghegan

On 28 June 2012 20:00, Tom Lane  wrote:
> See VACUUM FULL for a recent counterexample --- we basically jacked it
> up and drove a new implementation underneath, but we didn't change the
> name, despite the fact that we were obsoleting a whole lot more folk
> knowledge than exists around commit_delay.
>
> Of course, there were application-compatibility reasons not to rename
> that command, which wouldn't apply so much to commit_delay.  But still,
> we have precedent for expecting that we can fix external documentation
> rather than trying to code around whatever it says.

I'm sure you're right to some degree. We can rely on some, maybe even
most users to go and learn about these things, or hear about them
somehow. But why should we do it that way, when what I've proposed is
so much simpler, and has no plausible downside?

http://archives.postgresql.org/pgsql-hackers/2005-06/msg01463.php

Here, you say:

""
The issue here is not "is group commit a good idea in the abstract?".
It is "is the commit_delay implementation of the idea worth a dime?"
... and the evidence we have all points to the answer "NO".  We should
not let theoretical arguments blind us to this.
""

What happens when someone reads this, or many other such statements
were made before or since, when they are considering setting
commit_delay now?

The VACUUM FULL implementation is now very close to being nothing more
than a synonym of CLUSTER. The only reason it happened that way was
because it was easier than just deprecating VACUUM FULL. The old
VACUUM FULL actually had something to recommend it. It seems unlikely
that the same thing can be said for commit_delay.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posix Shared Mem patch

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 2:51 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I tried this.  At least on my fairly vanilla MacOS X desktop, an mlock
>> for a larger amount of memory than was conveniently on hand (4GB, on a
>> 4GB box) neither succeeded nor failed in a timely fashion but instead
>> progressively hung the machine, apparently trying to progressively
>> push every available page out to swap.  After 5 minutes or so I could
>> no longer move the mouse.  After about 20 minutes I gave up and hit
>> the reset button.  So there's apparently no value to this as a
>> diagnostic tool, at least on this platform.
>
> Fun.  I wonder if other BSDen are as brain-dead as OSX on this point.
>
> It'd probably at least be worth filing a bug report with Apple about it.

Just for fun, I tried writing a program that does power-of-two-sized
malloc requests.

The first one that failed - on my 4GB Mac, remember - was for
140737488355328 bytes.  Yeah, that' s right: 128 TB.

According to the Google, there is absolutely no way of gettIng MacOS X
not to overcommit like crazy.  You can read the amount of system
memory by using sysctl() to fetch hw.memsize, but it's not really
clear how much that helps.  We could refuse to start up if the shared
memory allocation is >= hw.memsize, but even an amount slightly less
than that seems like enough to send the machine into a tailspin, so
I'm not sure that really gets us anywhere.

One idea I had was to LOG the size of the shared memory allocation
just before allocating it.  That way, if your system goes into the
tank, there will at least be something in the log.  But that would be
useless chatter for most users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] embedded list v2

2012-06-28 Thread Alvaro Herrera


Excerpts from Andres Freund's message of jue jun 28 14:20:59 -0400 2012:

> Looks good now?

The one thing I dislike about this code is the names you've chosen.  I
mean, ilist_s_stuff and ilist_d_stuff.  I mean, why not just Slist_foo
and Dlist_bar, say?  As far as I can tell, you've chosen the "i" prefix
because it's "integrated" or "inline", but this seems to me a rather
irrelevant implementation detail that's of little use to the callers.

Also, I don't find so great an idea to have everything in a single file.
Is there anything wrong with separating singly and doubly linked lists
each to its own file?  Other than you not liking it, I mean.  As a
person who spends some time trying to untangle header dependencies, I
would appreciate keeping stuff as lean as possible.  However, since
nobody else seems to have commented on this, maybe it's just me.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] initdb check_need_password fix

2012-06-28 Thread Peter Eisentraut

If you run initdb -A md5, you get an error

initdb: must specify a password for the superuser to enable md5 authentication

In 9.2, when you run something like initdb --auth-local=peer
--auth-host=md5, you still get that error, which I think is a mistake.
The error should only be shown if both authentication choices are
password-like.  Hence I propose the attached patch.


diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index bac8385..edd5d71 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2574,13 +2574,19 @@ struct tsearch_config_match
 }
 
 static void
-check_need_password(const char *authmethod)
+check_need_password(const char *authmethodlocal, const char *authmethodhost)
 {
-	if ((strcmp(authmethod, "md5") == 0 ||
-		 strcmp(authmethod, "password") == 0) &&
+	if ((strcmp(authmethodlocal, "md5") == 0 ||
+		 strcmp(authmethodlocal, "password") == 0) &&
+		(strcmp(authmethodhost, "md5") == 0 ||
+		 strcmp(authmethodhost, "password") == 0) &&
 		!(pwprompt || pwfilename))
 	{
-		fprintf(stderr, _("%s: must specify a password for the superuser to enable %s authentication\n"), progname, authmethod);
+		fprintf(stderr, _("%s: must specify a password for the superuser to enable %s authentication\n"), progname,
+(strcmp(authmethodlocal, "md5") == 0 ||
+ strcmp(authmethodlocal, "password") == 0)
+? authmethodlocal
+: authmethodhost);
 		exit(1);
 	}
 }
@@ -2792,8 +2798,7 @@ struct tsearch_config_match
 	check_authmethod_valid(authmethodlocal, auth_methods_local, "local");
 	check_authmethod_valid(authmethodhost, auth_methods_host, "host");
 
-	check_need_password(authmethodlocal);
-	check_need_password(authmethodhost);
+	check_need_password(authmethodlocal, authmethodhost);
 
 	if (strlen(pg_data) == 0)
 	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-28 Thread Alvaro Herrera


Excerpts from Josh Berkus's message of jue jun 28 15:03:15 -0400 2012:

> 2) They have large partitioned tables, in which the partitions are
> time-based and do not receive UPDATES after a certain date.  Each
> partition was larger than RAM.

I think the solution to this problem has nothing to do with vacuum or
autovacuum settings, and lots to do with cataloguing enough info about
each of these tables to note that, past a certain point, they don't need
any vacuuming at all.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 2:58 PM, Peter Geoghegan  wrote:
> On 28 June 2012 19:55, Robert Haas  wrote:
>> On Thu, Jun 28, 2012 at 2:18 PM, Peter Geoghegan  
>> wrote:
>>> You think it will confuse users less if we start telling them to use
>>> something that we have a very long history of telling them not to use?
>>
>> I don't buy this line of reasoning at all.  If we're going to rename
>> the GUC, it should be for accuracy, not PR value.  If we start
>> renaming something every time we improve it, we're going to go nuts.
>> We improved lots of things in 9.2; they didn't all get renamed.
>
> That is a false equivalence, and you know it.

A false equivalence between what and what?

> Who said anything about PR value? I'm concerned with not confusing users.

My point here is that we don't normally rename things because they
work better than they used to.  Whether that is called PR value or not
confusing users, we don't normally do it.

We sometimes rename things because they do something *different* than
what they used to do.  But not just because they work better.

Anyway, it seems that no one other than you and I is very excited
about renaming this for whatever reason, so maybe we should leave it
at that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] embedded list v2

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 3:47 PM, Alvaro Herrera
 wrote:
>
> Excerpts from Andres Freund's message of jue jun 28 14:20:59 -0400 2012:
>
>> Looks good now?
>
> The one thing I dislike about this code is the names you've chosen.  I
> mean, ilist_s_stuff and ilist_d_stuff.  I mean, why not just Slist_foo
> and Dlist_bar, say?  As far as I can tell, you've chosen the "i" prefix
> because it's "integrated" or "inline", but this seems to me a rather
> irrelevant implementation detail that's of little use to the callers.
>
> Also, I don't find so great an idea to have everything in a single file.
> Is there anything wrong with separating singly and doubly linked lists
> each to its own file?  Other than you not liking it, I mean.  As a
> person who spends some time trying to untangle header dependencies, I
> would appreciate keeping stuff as lean as possible.  However, since
> nobody else seems to have commented on this, maybe it's just me.

Well, it's not JUST you.  I agree entirely with all of your points.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-28 Thread Christopher Browne

On Thu, Jun 28, 2012 at 3:03 PM, Josh Berkus  wrote:
> 1. Databases can inadvertently get to the state where many tables need
> wraparound vacuuming at exactly the same time, especially if they have
> many "cold" data partition tables.

This suggests that this should be handled rather earlier, and with
some attempt to not do them all simultaneously.

In effect, if there are 25 tables that will need wraparound vacuums in
the next million transactions, it is presumably beneficial to start
hitting on them right away, ideally one at a time, so as to draw their
future needs further apart.

The loose thought is that any time autovac isn't very busy, it should
consider (perhaps based on probability?) picking a table that is in a
cluster of tables that currently have wraparound needs at about the
same time, and, in effect, spread that cluster out.

I suppose there are two considerations, that conflict somewhat:
a) If there are tables that Absolutely Require wraparound vacuuming,
Right Soon Now, there's nothing to help this.  They MUST be vacuumed,
otherwise the system will get very unhappy.
b) It's undesirable to *worsen* things by 'clustering' future
wraparound vacuums together, which gets induced any time autovac is
continually vacuuming a series of tables.  If 25 tables get vacuumed
right around now, then that may cluster their next wraparound vacuum
to 2^31 transactions from 'right around now.'

But there's no helping a).

I suppose this suggests having an autovac thread that is 'devoted' to
spreading out future wraparound vacuums.

- If a *lot* of tables were just vacuumed recently, then it shouldn't
do anything, as Right Now is a cluster of 'badness.'
- It should group tables by slicing their next wraparounds (grouping
by rounding wraparound txid to the nearest, say, 10M or 20M), and
consider vacuuming a table Right Now that would take that table out of
the worst such "slice"

Thus, supposing the grouping is like:

| TxId - nearest 10 million | Tables Wrapping In Range |
|---+--|
| 0 |  250 |
| 1 |   80 |
| 2 |   72 |
| 3 |   30 |
| 4 |   21 |
| 5 |   35 |
| 6 |9 |
| 7 |   15 |
| 8 |8 |
| 9 |7 |
|10 |   22 |
|11 |   35 |
|12 |   14 |
|13 |  135 |
|14 |  120 |
|15 |   89 |
|16 |   35 |
|17 |   45 |
|18 |   60 |
|19 |   25 |
|20 |   15 |
|21 |  150 |

Suppose current txid is 750, and the reason for there to be 250
tables in the current range is that there are a bunch of tables that
get *continually* vacuumed.  No need to worry about that range, and
I'll presume that these are all in the past.

In this example, it's crucial to, pretty soon, vacuum the 150 tables
in partition #21, as they're getting near wraparound.  Nothing to be
improved on there.  Though it would be kind of nice to start on the
150 as early as possible, so that we *might* avoid having them
dominate autovac, as in Josh Berkus' example.

But once those are done, the next "crucial" set, in partition #20, are
a much smaller set of tables.  It would be nice, at that point, to add
in a few tables from partitions #13 and #14, to smooth out the burden.
The ideal "steady state" would look like the following:

| TxId - nearest 10 million | Tables Wrapping In Range |
|---+--|
| 0 |  250 |
| 1 |   51 |
| 2 |   51 |
| 3 |   51 |
| 4 |   51 |
| 5 |   51 |
| 6 |   51 |
| 7 |   51 |
| 8 |   51 |
| 9 |   51 |
|10 |   51 |
|11 |   51 |
|12 |

Re: [HACKERS] embedded list v2

2012-06-28 Thread Andres Freund

On Thursday, June 28, 2012 09:47:05 PM Alvaro Herrera wrote:
> Excerpts from Andres Freund's message of jue jun 28 14:20:59 -0400 2012:
> > Looks good now?
> 
> The one thing I dislike about this code is the names you've chosen.  I
> mean, ilist_s_stuff and ilist_d_stuff.  I mean, why not just Slist_foo
> and Dlist_bar, say?  As far as I can tell, you've chosen the "i" prefix
> because it's "integrated" or "inline", but this seems to me a rather
> irrelevant implementation detail that's of little use to the callers.
Well, its not irrelevant because you actually need to change the contained 
structs to use it. I find that a pretty relevant distinction.

> Also, I don't find so great an idea to have everything in a single file.
> Is there anything wrong with separating singly and doubly linked lists
> each to its own file?  Other than you not liking it, I mean.  As a
> person who spends some time trying to untangle header dependencies, I
> would appreciate keeping stuff as lean as possible.  However, since
> nobody else seems to have commented on this, maybe it's just me.
Robert had the same comment, its not just you...

It would mean duplicating the ugliness around the conditional inlining, the 
comment explaining how to use the stuff (because its basically used the same 
way for single and double linked lists), you would need to #define 
ilist_container twice or have a third file
Just seems to much overhead for ~100 lines (the single linked list 
implementation).

What I wonder is how hard it would be to remove catcache.h's structs into the 
implementation. Thats the reason why the old and new list implementation 
currently is included all over the backend...

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Peter Geoghegan

On 28 June 2012 20:55, Robert Haas  wrote:
>>> I don't buy this line of reasoning at all.  If we're going to rename
>>> the GUC, it should be for accuracy, not PR value.  If we start
>>> renaming something every time we improve it, we're going to go nuts.
>>> We improved lots of things in 9.2; they didn't all get renamed.
>>
>> That is a false equivalence, and you know it.
>
> A false equivalence between what and what?

Changing the name in order to create some buzz about it, or to
highlight an improvement, is quite different to changing the name
because the practical advice surrounding the setting has suddenly
altered, almost diametrically, and after a very long period.

>> Who said anything about PR value? I'm concerned with not confusing users.
>
> My point here is that we don't normally rename things because they
> work better than they used to.  Whether that is called PR value or not
> confusing users, we don't normally do it.

That makes sense. I don't dispute that. We aren't talking about an
incremental improvement here, are we? We're talking about night and
day. That is probably an unprecedented state of affairs, so I don't
see any reason to invoke precedent.

The reason the VACUUM FULL situation isn't at all comparable here is because:

For VACUUM FULL:

User googles "VACUUM FULL", gets something from the 8.* docs, or
something from that era. (This happens more often than not for any
given Postgres term, as you rightly complained about at PgCon).

They see something that says "you should probably use cluster
instead". They do so, and are no worse off for it.

For commit_delay:

User googles commit_delay, and sees something from that same period
that says "you don't want to use this". They naturally assume that
they don't. They lose whatever benefit they might have had from the
new implementation.

> We sometimes rename things because they do something *different* than
> what they used to do.  But not just because they work better.

It does do something different. That's why you proposed changing the name.

> Anyway, it seems that no one other than you and I is very excited
> about renaming this for whatever reason, so maybe we should leave it
> at that.

I think not changing the name is a really bad decision, and I am
personally unhappy about it. I immediately agreed to your proposed
adjustment to the patch without really thinking that it mattered,
because it had no plausible downside, so why not?

That's all I have to say about the matter. If it isn't obvious that
the name should be changed, based on what I've already said, it never
will be.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] embedded list v2

2012-06-28 Thread Andres Freund

On Thursday, June 28, 2012 10:03:26 PM Andres Freund wrote:
> What I wonder is how hard it would be to remove catcache.h's structs into
> the  implementation. Thats the reason why the old and new list
> implementation currently is included all over the backend...
Moving them into the implementation isn't possible, but catcache.h being 
included just about everywhere simply isn't needed.

It being included everywhere was introduced by a series of commits from Bruce:
b85a965f5fc7243d0386085e12f7a6c836503b42
b43ebe5f83b28e06a3fd933b989aeccf0df7844a
e0522505bd13bc5aae993fc50b8f420665d78e96
and others

That looks broken. An implementation file not including its own header... A 
minimal patch to fix this particular problem is attached (looks like there are 
others in the series).

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 45e2c358e6a21e837f13731981da2644bcb57a88 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 28 Jun 2012 23:03:44 +0200
Subject: [PATCH] Stop including catcache.h from syscache.h

syscache.h used to not rely on catcache.h and even today ships with the comment
"Users of this must import catcache.h too" for the one function requiring
catcache.h knowledge.

This was changed in a series of commits including:
b85a965f5fc7243d0386085e12f7a6c836503b42
b43ebe5f83b28e06a3fd933b989aeccf0df7844a
e0522505bd13bc5aae993fc50b8f420665d78e96

Change it back.
---
 src/backend/access/transam/xact.c |1 +
 src/backend/catalog/namespace.c   |1 +
 src/backend/catalog/pg_conversion.c   |1 +
 src/backend/catalog/pg_enum.c |1 +
 src/backend/utils/adt/acl.c   |1 +
 src/backend/utils/cache/attoptcache.c |1 +
 src/backend/utils/cache/catcache.c|1 +
 src/backend/utils/cache/inval.c   |1 +
 src/backend/utils/cache/lsyscache.c   |1 +
 src/backend/utils/cache/relcache.c|1 +
 src/backend/utils/cache/spccache.c|1 +
 src/backend/utils/cache/syscache.c|1 +
 src/backend/utils/cache/ts_cache.c|1 +
 src/backend/utils/cache/typcache.c|1 +
 src/backend/utils/resowner/resowner.c |5 +++--
 src/include/utils/resowner.h  |   10 ++
 src/include/utils/snapmgr.h   |1 +
 src/include/utils/syscache.h  |2 +-
 18 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 4755ee6..1f743f7 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -44,6 +44,7 @@
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
 #include "storage/smgr.h"
+#include "utils/catcache.h"
 #include "utils/combocid.h"
 #include "utils/guc.h"
 #include "utils/inval.h"
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 20850ab..10ad82b 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -46,6 +46,7 @@
 #include "storage/sinval.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/catcache.h"
 #include "utils/guc.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
diff --git a/src/backend/catalog/pg_conversion.c b/src/backend/catalog/pg_conversion.c
index f86c84f..358bd39 100644
--- a/src/backend/catalog/pg_conversion.c
+++ b/src/backend/catalog/pg_conversion.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_proc.h"
 #include "mb/pg_wchar.h"
 #include "utils/builtins.h"
+#include "utils/catcache.h"
 #include "utils/fmgroids.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
index 41665c1..20e26c4 100644
--- a/src/backend/catalog/pg_enum.c
+++ b/src/backend/catalog/pg_enum.c
@@ -23,6 +23,7 @@
 #include "storage/lmgr.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
+#include "utils/catcache.h"
 #include "utils/fmgroids.h"
 #include "utils/syscache.h"
 #include "utils/tqual.h"
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index 77322a1..2cc87d8 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -29,6 +29,7 @@
 #include "miscadmin.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/catcache.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
diff --git a/src/backend/utils/cache/attoptcache.c b/src/backend/utils/cache/attoptcache.c
index e01ae21..5d872ba 100644
--- a/src/backend/utils/cache/attoptcache.c
+++ b/src/backend/utils/cache/attoptcache.c
@@ -18,6 +18,7 @@
 
 #include "access/reloptions.h"
 #include "utils/attoptcache.h"
+#include "utils/catcache.h"
 #include "utils/hsearch.h"
 #include "utils/inval.h"
 #include "utils/syscache.h"
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 0307b96..f27d90d 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c

Re: [HACKERS] embedded list v2

2012-06-28 Thread Alvaro Herrera


Excerpts from Andres Freund's message of jue jun 28 16:03:26 -0400 2012:
> 
> On Thursday, June 28, 2012 09:47:05 PM Alvaro Herrera wrote:
> > Excerpts from Andres Freund's message of jue jun 28 14:20:59 -0400 2012:
> > > Looks good now?
> > 
> > The one thing I dislike about this code is the names you've chosen.  I
> > mean, ilist_s_stuff and ilist_d_stuff.  I mean, why not just Slist_foo
> > and Dlist_bar, say?  As far as I can tell, you've chosen the "i" prefix
> > because it's "integrated" or "inline", but this seems to me a rather
> > irrelevant implementation detail that's of little use to the callers.
> Well, its not irrelevant because you actually need to change the contained 
> structs to use it. I find that a pretty relevant distinction.

Sure, at that point it is relevant.  Once you're past that, there's no
point.  I mean, you would look up how it's used in the header comment of
the implementation, and then just refer to the API.

> > Also, I don't find so great an idea to have everything in a single file.
> > Is there anything wrong with separating singly and doubly linked lists
> > each to its own file?  Other than you not liking it, I mean.  As a
> > person who spends some time trying to untangle header dependencies, I
> > would appreciate keeping stuff as lean as possible.  However, since
> > nobody else seems to have commented on this, maybe it's just me.
> Robert had the same comment, its not just you...
> 
> It would mean duplicating the ugliness around the conditional inlining, the 
> comment explaining how to use the stuff (because its basically used the same 
> way for single and double linked lists), you would need to #define 
> ilist_container twice or have a third file
> Just seems to much overhead for ~100 lines (the single linked list 
> implementation).

Well, then don't duplicate a comment -- create a README.lists and
refer to it in the comments.  Not sure about the ilist_container stuff,
but in principle I don't have a problem with having a file with a single
#define that's included by two other headers.

> What I wonder is how hard it would be to remove catcache.h's structs into the 
> implementation. Thats the reason why the old and new list implementation 
> currently is included all over the backend...

Yeah, catcache.h is a pretty ugly beast.  I'm sure there are ways to behead it.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Daniel Farina

On Thu, Jun 28, 2012 at 1:32 PM, Peter Geoghegan  wrote:
>> Anyway, it seems that no one other than you and I is very excited
>> about renaming this for whatever reason, so maybe we should leave it
>> at that.
>
> I think not changing the name is a really bad decision, and I am
> personally unhappy about it. I immediately agreed to your proposed
> adjustment to the patch without really thinking that it mattered,
> because it had no plausible downside, so why not?
>
> That's all I have to say about the matter. If it isn't obvious that
> the name should be changed, based on what I've already said, it never
> will be.

All in all, I don't think this can be a very productive discussion
unless someone just pitches a equal or better name overall in terms of
conciseness and descriptiveness.  I'd rather optimize for those
attributes.  Old advice is old; that's the nature of the beast.

The one benefit I can think of for name shuffling is avoiding
accidental negation of the optimization when porting Postgres
configuration from older clusters, and even then I don't think the
rename-on-change strategy is a scalable one; a tuning-linting
tool is probably the only scalable option if one is concerned about
making sure that those forward-porting their configurations or
managing multiple postgres versions don't shoot themselves in the
foot.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Peter Geoghegan

On 28 June 2012 22:22, Daniel Farina  wrote:
> All in all, I don't think this can be a very productive discussion
> unless someone just pitches a equal or better name overall in terms of
> conciseness and descriptiveness.  I'd rather optimize for those
> attributes.  Old advice is old; that's the nature of the beast.

Robert suggested wal_flush_delay, which does more accurately describe
what happens now.

Old advice is old, but we frequently go to reasonable lengths to
protect users from making predictable mistakes. The position that we
shouldn't change the name might make sense if there was any downside
to doing so, but there isn't.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] embedded list v2

2012-06-28 Thread Alvaro Herrera


Excerpts from Andres Freund's message of jue jun 28 17:06:49 -0400 2012:
> On Thursday, June 28, 2012 10:03:26 PM Andres Freund wrote:
> > What I wonder is how hard it would be to remove catcache.h's structs into
> > the  implementation. Thats the reason why the old and new list
> > implementation currently is included all over the backend...
> Moving them into the implementation isn't possible, but catcache.h being 
> included just about everywhere simply isn't needed.
> 
> It being included everywhere was introduced by a series of commits from Bruce:
> b85a965f5fc7243d0386085e12f7a6c836503b42
> b43ebe5f83b28e06a3fd933b989aeccf0df7844a
> e0522505bd13bc5aae993fc50b8f420665d78e96
> and others
> 
> That looks broken. An implementation file not including its own header... A 
> minimal patch to fix this particular problem is attached (looks like there 
> are 
> others in the series).

Hmm, I think this is against project policy -- we do want each header to
be compilable separately.  I would go instead the way of splitting
resowner.h in two or more pieces.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] embedded list v2

2012-06-28 Thread Andres Freund

On Thursday, June 28, 2012 11:45:08 PM Alvaro Herrera wrote:
> Excerpts from Andres Freund's message of jue jun 28 17:06:49 -0400 2012:
> > On Thursday, June 28, 2012 10:03:26 PM Andres Freund wrote:
> > > What I wonder is how hard it would be to remove catcache.h's structs
> > > into the  implementation. Thats the reason why the old and new list
> > > implementation currently is included all over the backend...
> > 
> > Moving them into the implementation isn't possible, but catcache.h being
> > included just about everywhere simply isn't needed.
> > 
> > It being included everywhere was introduced by a series of commits from
> > Bruce: b85a965f5fc7243d0386085e12f7a6c836503b42
> > b43ebe5f83b28e06a3fd933b989aeccf0df7844a
> > e0522505bd13bc5aae993fc50b8f420665d78e96
> > and others
> > 
> > That looks broken. An implementation file not including its own header...
> > A minimal patch to fix this particular problem is attached (looks like
> > there are others in the series).
> 
> Hmm, I think this is against project policy -- we do want each header to
> be compilable separately.  I would go instead the way of splitting
> resowner.h in two or more pieces.
It was done nearly the same way in catcache.h before Bruce changed things. You 
can see still the rememnants of that in syscache.h:
/* list-search interface.  Users of this must import catcache.h too */
extern struct catclist *SearchSysCacheList(int cacheId, int nkeys,
   Datum key1, Datum key2, Datum key3, Datum 
key4);

The only difference is that gcc warns if you declare a struct in a parameter - 
thats why I forward declared it explicitly...

resowner.h still compiles standalone and is still usable. You can even call 
ResourceOwnerRememberCatCacheListRef if you get the list parameter from 
somewhere else.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-28 Thread Daniel Farina

On Thu, Jun 28, 2012 at 2:32 PM, Peter Geoghegan  wrote:
> On 28 June 2012 22:22, Daniel Farina  wrote:
>> All in all, I don't think this can be a very productive discussion
>> unless someone just pitches a equal or better name overall in terms of
>> conciseness and descriptiveness.  I'd rather optimize for those
>> attributes.  Old advice is old; that's the nature of the beast.
>
> Robert suggested wal_flush_delay, which does more accurately describe
> what happens now.

Well, I learned something from reading this name, having not followed
the mechanism too closely.  I like it.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpython issue with Win64 (PG 9.2)

2012-06-28 Thread Jan Urbański


On 27/06/12 13:57, Jan Urbański wrote:

On 27/06/12 11:51, Asif Naeem wrote:

Hi,

On Windows 7 64bit, plpython is causing server crash with the following
test case i.e.

CREATE PROCEDURAL LANGUAGE 'plpython3u';

CREATE OR REPLACE FUNCTION pymax (a integer, b integer)
RETURNS integer
AS $$
if a> b:
return a
return b
$$ LANGUAGE plpython3u;
SELECT pymax(1, 2);




I think primary reason that trigger this issue is when Function
PLyUnicode_Bytes() calls "PyUnicode_AsEncodedString( ,WIN1252 /*Server
encoding*/, ) " it fails with null. I built latest pg 9.2 source code
with
python 3.2.2.3 by using Visual Studio 2010. Thanks.


I'll try to reproduce this on Linux, which should be possible given the
results of your investigation.


Your analysis is correct, I managed to reproduce this by injecting

serverenc = "win1252";

into PLyUnicode_Bytes. The comment in that function says that Python 
understands all PostgreSQL encoding names except for SQL_ASCII, but 
that's not really true. In your case GetDatabaseEncodingName() returns 
"WIN1252" and Python accepts "CP125".


I'm wondering how this should be fixed. Just by adding more special 
cases in PLyUnicode_Bytes?


Even if we add a switch statement that would convert PG_WIN1250 into 
"CP1250", Python can still raise an exception when encoding (for various 
reasons). How about replacing the PLy_elog there with just an elog? This 
loses traceback support and the Python exception message, which could be 
helpful for debugging (something like "invalid character  for 
encoding cp1250"). OTOH, I'm uneasy about invoking the entire PLy_elog 
machinery from a function that's as low-level as PLyUnicode_Bytes.


Lastly, we map SQL_ASCII to "ascii" which is arguably wrong. The 
function is supposed to return bytes in the server encoding, and under 
SQL_ASCII that probably means we can return anything (ie. use any 
encoding we deem useful). Using "ascii" as the Python codec name will 
raise an error on anything that has the high bit set.


So: I'd add code to translate WINxxx into CPxxx when choosing the Python 
to use, change PLy_elog to elog in PLyUnicode_Bytes and leave the 
SQL_ASCII case alone, as there were no complaints and people using 
SQL_ASCII are asking for it anyway.


Cheers,
Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb check_need_password fix

2012-06-28 Thread Robert Haas

On Thu, Jun 28, 2012 at 3:48 PM, Peter Eisentraut  wrote:
> If you run initdb -A md5, you get an error
>
> initdb: must specify a password for the superuser to enable md5 authentication
>
> In 9.2, when you run something like initdb --auth-local=peer
> --auth-host=md5, you still get that error, which I think is a mistake.
> The error should only be shown if both authentication choices are
> password-like.  Hence I propose the attached patch.

Looks right to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-28 Thread Tom Lane

Josh Berkus  writes:
> So there are two parts to this problem, each of which needs a different
> solution:

> 1. Databases can inadvertently get to the state where many tables need
> wraparound vacuuming at exactly the same time, especially if they have
> many "cold" data partition tables.

I'm not especially sold on your theory that there's some behavior that
forces such convergence, but it's certainly plausible that there was,
say, a schema alteration applied to all of those partitions at about the
same time.  In any case, as Robert has been saying, it seems like it
would be smart to try to get autovacuum to spread out the
anti-wraparound work a bit better when it's faced with a lot of tables
with similar relfrozenxid values.

> 2. When we do hit wraparound thresholds for multiple tables, autovacuum
> has no hesitation about doing autovacuum_max_workers worth of wraparound
> vacuum simultaneously, even when that exceeds the I/O capactity of the
> system.

I continue to maintain that this problem is unrelated to wraparound as
such, and that thinking it is is a great way to design a bad solution.
There are any number of reasons why autovacuum might need to run
max_workers at once.  What we need to look at is making sure that they
don't run the system into the ground when that happens.

Since your users weren't complaining about performance with one or two
autovac workers running (were they?), we can assume that the cost-delay
settings were such as to not create a problem in that scenario.  So it
seems to me that it's down to autovac_balance_cost().  Either there's
a plain-vanilla bug in there, or seek costs are breaking the assumption
that it's okay to give N workers each 1/Nth of the single-worker I/O
capacity.

As far as bugs are concerned, I wonder if the premise of the calculation

 * The idea here is that we ration out I/O equally.  The amount of I/O
 * that a worker can consume is determined by cost_limit/cost_delay, so we
 * try to equalize those ratios rather than the raw limit settings.

might be wrong in itself?  The ratio idea seems plausible but ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Notify system doesn't recover from "No space" error

2012-06-28 Thread Tom Lane

Christoph Berg  writes:
> Warming up an old thread here - we ran into the same problem.

Thanks for the example proving it is repeatable.

I dug into this a bit and found the problem.  When a page of pg_notify
is filled, asyncQueueAddEntries() advances the global QUEUE_HEAD to
point to the next page before it calls SimpleLruZeroPage() to make that
page exist in slru.c.  But SimpleLruZeroPage could fail --- even though
it doesn't attempt to write out the new page, it might need to dump
some existing dirty page from shared memory before it can allocate a
slot to the new page.  If it does fail, then QUEUE_HEAD is pointing
at a page number that doesn't exist either on disk or in slru.c buffers.
Subsequent attempts to execute asyncQueueAddEntries() will fail because
SimpleLruReadPage() says "page? what page?".  And there is noplace else
in async.c that would try to create the missing page.  This error will
persist until the server is restarted (thus resetting the state of
pg_notify), even if the underlying disk-full condition is cleared.

The solution is to make sure we don't advance QUEUE_HEAD into the new
page until SimpleLruZeroPage has created it.  There are a couple of ways
the code could be fixed, but what I chose to do in the attached proposed
patch is to make asyncQueueAddEntries work with a local queue_head
pointer, which is only copied back to shared memory on successful exit.

Comments, objections?

regards, tom lane

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index fcb087ed15dfbb4d567ee0c742e1db8ec1353a2d..6fcd9093268e7bc5aaf6102fa9d6adcdcced1d6d 100644
*** a/src/backend/commands/async.c
--- b/src/backend/commands/async.c
*** static ListCell *
*** 1285,1290 
--- 1285,1291 
  asyncQueueAddEntries(ListCell *nextNotify)
  {
  	AsyncQueueEntry qe;
+ 	QueuePosition queue_head;
  	int			pageno;
  	int			offset;
  	int			slotno;
*** asyncQueueAddEntries(ListCell *nextNotif
*** 1292,1299 
  	/* We hold both AsyncQueueLock and AsyncCtlLock during this operation */
  	LWLockAcquire(AsyncCtlLock, LW_EXCLUSIVE);
  
  	/* Fetch the current page */
! 	pageno = QUEUE_POS_PAGE(QUEUE_HEAD);
  	slotno = SimpleLruReadPage(AsyncCtl, pageno, true, InvalidTransactionId);
  	/* Note we mark the page dirty before writing in it */
  	AsyncCtl->shared->page_dirty[slotno] = true;
--- 1293,1313 
  	/* We hold both AsyncQueueLock and AsyncCtlLock during this operation */
  	LWLockAcquire(AsyncCtlLock, LW_EXCLUSIVE);
  
+ 	/*
+ 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
+ 	 * memory upon exiting.  The reason for this is that if we have to advance
+ 	 * to a new page, SimpleLruZeroPage might fail (out of disk space, for
+ 	 * instance), and we must not advance QUEUE_HEAD if it does.  (Otherwise,
+ 	 * subsequent insertions would try to put entries into a page that slru.c
+ 	 * thinks doesn't exist yet.)  So, use a local position variable.  Note
+ 	 * that if we do fail, any already-inserted queue entries are forgotten;
+ 	 * this is okay, since they'd be useless anyway after our transaction
+ 	 * rolls back.
+ 	 */
+ 	queue_head = QUEUE_HEAD;
+ 
  	/* Fetch the current page */
! 	pageno = QUEUE_POS_PAGE(queue_head);
  	slotno = SimpleLruReadPage(AsyncCtl, pageno, true, InvalidTransactionId);
  	/* Note we mark the page dirty before writing in it */
  	AsyncCtl->shared->page_dirty[slotno] = true;
*** asyncQueueAddEntries(ListCell *nextNotif
*** 1305,1311 
  		/* Construct a valid queue entry in local variable qe */
  		asyncQueueNotificationToEntry(n, &qe);
  
! 		offset = QUEUE_POS_OFFSET(QUEUE_HEAD);
  
  		/* Check whether the entry really fits on the current page */
  		if (offset + qe.length <= QUEUE_PAGESIZE)
--- 1319,1325 
  		/* Construct a valid queue entry in local variable qe */
  		asyncQueueNotificationToEntry(n, &qe);
  
! 		offset = QUEUE_POS_OFFSET(queue_head);
  
  		/* Check whether the entry really fits on the current page */
  		if (offset + qe.length <= QUEUE_PAGESIZE)
*** asyncQueueAddEntries(ListCell *nextNotif
*** 1331,1338 
  			   &qe,
  			   qe.length);
  
! 		/* Advance QUEUE_HEAD appropriately, and note if page is full */
! 		if (asyncQueueAdvance(&(QUEUE_HEAD), qe.length))
  		{
  			/*
  			 * Page is full, so we're done here, but first fill the next page
--- 1345,1352 
  			   &qe,
  			   qe.length);
  
! 		/* Advance queue_head appropriately, and detect if page is full */
! 		if (asyncQueueAdvance(&(queue_head), qe.length))
  		{
  			/*
  			 * Page is full, so we're done here, but first fill the next page
*** asyncQueueAddEntries(ListCell *nextNotif
*** 1342,1353 
  			 * asyncQueueIsFull() ensured that there is room to create this
  			 * page without overrunning the queue.
  			 */
! 			slotno = SimpleLruZeroPage(AsyncCtl, QUEUE_POS_PAGE(QUEUE_HEAD));
  			/* And exit the loop */
  			break;
  		}
  	}
  
  	LWLoc

Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-28 Thread Josh Berkus


> I'm not especially sold on your theory that there's some behavior that
> forces such convergence, but it's certainly plausible that there was,
> say, a schema alteration applied to all of those partitions at about the
> same time.  In any case, as Robert has been saying, it seems like it
> would be smart to try to get autovacuum to spread out the
> anti-wraparound work a bit better when it's faced with a lot of tables
> with similar relfrozenxid values.

Well, I think we can go even further than that.  I think one of the
fundamental problems is that our "opportunistic" vacuum XID approach is
essentially broken for any table which doesn't receive continuous
update/deletes (I think Chris Browne makes largely the same point).

They way opportunism currently works is via vacuum_freeze_table_age,
which says "if you were going to vacuum this table *anyway*, and it's
relfrozenxid is # old, then full-scan it".  That works fine for tables
getting constant UPDATEs to avoid hitting the wraparound deadline, but
tables which have stopped getting activity, or are insert-only, never
get it.

What we should have instead is some opportunism in autovacuum which says:

"If I have otherwise idle workers, and the system isn't too busy*, find
the table with the oldest relfrozenxid which is over
autovacuum_max_freeze_age/2 and vacuum-full-scan it."

The key difficulty there is "if the system isn't too busy".  That's a
hard thing to determine, and subject to frequent change.  An
opportunistic solution would still be useful without that requirement,
but not as helpful.

I don't find Stephen's proposal of goal-based solutions to be practical.
 A goal-based approach makes the assumption that database activity is
predictable, and IME most databases are anything but.

A second obstacle to "opportunistic wraparound vacuum" is that
wraparound vacuum is not interruptable. If you have to kill it off and
do something else for a couple hours, it can't pick up where it left
off; it needs to scan the whole table from the beginning again.

> I continue to maintain that this problem is unrelated to wraparound as
> such, and that thinking it is is a great way to design a bad solution.
> There are any number of reasons why autovacuum might need to run
> max_workers at once.  What we need to look at is making sure that they
> don't run the system into the ground when that happens.

100% agree.

> Since your users weren't complaining about performance with one or two
> autovac workers running (were they?),

No, it's when we hit 3 that it fell over.  Thresholds vary with memory
and table size, of course.

BTW, the primary reason I think (based on a glance at system stats) this
drove the system to its knees was that the simultaneous wraparound
vacuum of 3 old-cold tables evicted all of the "current" data out of the
FS cache, forcing user queries which would normally hit the FS cache
onto disk.  I/O throughput was NOT at 100% capacity.

During busy periods, a single wraparound vacuum wasn't enough to clear
the FS cache because it's competing on equal terms with user access to
data.  But three avworkers "ganged up" on the user queries and kicked
the tar out of them.

Unfortunately, for the 5-worker system, I didn't find out about the
issue until after it was over, and I know it was related to wraparound
only because we were logging autovacuum.  So I don't know if it had the
same case.

There are also problems with our defaults and measurements for the
various vacuum_freeze settings, but changing those won't really fix the
underlying problem, so it's not worth fiddling with them.

The other solution, as mentioned last year, is to come up with a way in
which old-cold data doesn't need to be vacuumed *at all*.This would
be the ideal solution, but it's not clear how to implement it, since any
wraparound-counting solution would bloat the CLOG intolerably.

> we can assume that the cost-delay
> settings were such as to not create a problem in that scenario.  So it
> seems to me that it's down to autovac_balance_cost().  Either there's
> a plain-vanilla bug in there, or seek costs are breaking the assumption
> that it's okay to give N workers each 1/Nth of the single-worker I/O
> capacity.

Yeah, I think our I/O balancing approach was too simplistic to deal with
situations like this one.  Factors I think break it are:

* modifying cost-limit/cost-delay doesn't translate exactly into 1:1
modifying I/O (in fact, it seems higly unlikely that it does)
* seek costs, as you mention
* FS cache issues and competition with user queries (per above)

> As far as bugs are concerned, I wonder if the premise of the calculation
> 
>  * The idea here is that we ration out I/O equally.  The amount of I/O
>  * that a worker can consume is determined by cost_limit/cost_delay, so we
>  * try to equalize those ratios rather than the raw limit settings.
> 
> might be wrong in itself?  The ratio idea seems plausible but ...

Well, I think it's "plausible but wron

Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-28 Thread Stephen Frost

* Josh Berkus (j...@agliodbs.com) wrote:
> I don't find Stephen's proposal of goal-based solutions to be practical.
>  A goal-based approach makes the assumption that database activity is
> predictable, and IME most databases are anything but.

We're talking about over the entire transaction space, and we can be
pretty liberal, in my view, with our estimates.  If we get it right, we
might risk doing more autovac's for wraparound than strictly necessary,
but they should happen over a sufficient time that it doesn't cause
performance issues.

One definite problem with this, of course, is that the wraparound
autovac can't be stopped and restarted, and anything that increases the
amount of wall-clock time required to complete the autovac will
necessairly increase the risk that we'll lose a bunch of work due to a
database restart.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-28 Thread Etsuro Fujita

Hi Ants,

> -Original Message-
> From: Ants Aasma [mailto:a...@cybertec.at]
> Sent: Wednesday, June 27, 2012 9:23 PM
> To: Robert Haas
> Cc: Etsuro Fujita; Jay Levitt; Tom Lane; PostgreSQL-development; Francois
> Deliege
> Subject: Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is
needed
> 
> Sorry about the delay in answering. I have been swamped with non-PG
> related things lately.
> 
> On Tue, Jun 26, 2012 at 11:08 PM, Robert Haas  wrote:
> > On Fri, Jun 22, 2012 at 10:12 AM, Robert Haas  wrote:
> >> On Tue, Jun 19, 2012 at 5:41 AM, Etsuro Fujita
> >>  wrote:
>  I'm confused by this remark, because surely the query planner does it
this
>  way only if there's no LIMIT.  When there is a LIMIT, we choose based on
>  the startup cost plus the estimated fraction of the total cost we expect
>  to pay based on dividing the LIMIT by the overall row count estimate.  Or
>  is this not what you're talking about?
> 
> My reasoning was that query_planner returns the cheapest-total path
> and cheapest fractional presorted (by the aggregation pathkeys). When
> evaluating hash-aggregates with this patch these two are indeed
> compared considering the esimated fraction of the total cost, but this
> might miss cheapest fractional unordered path for lazy hashaggregates.
> 
> Reviewing the code now I discovered this path could be picked out from
> the pathlist, just like it is done by
> get_cheapest_fractional_path_for_pathkeys when pathkeys is nil. This
> would need to be returned in addition to the other two paths. To
> minimize overhead, this should only be done when we possibly want to
> consider lazy hash-aggregation (there is a group clause with no
> aggregates and grouping is hashable) But this is starting to get
> pretty crufty considering that there doesn't seem to be any really
> compelling usecases for this.
> 
> > Ants, do you intend to update this patch for this CommitFest?  Or at
> > all?  It seems nobody's too excited about this, so I'm not sure
> > whether it makes sense for you to put more work on it.  But please
> > advise as to your plans.
> 
> If anyone thinks that this patch might be worth considering, then I'm
> prepared to do minor cleanup this CF (I saw some possibly unnecessary
> cruft in agg_fill_hash_and_retrieve). On the other hand, if you think
> the use case is too marginal to consider for inclusion then I won't
> shed a tear if this gets rejected. For me this was mostly a learning
> experience for poking around in the planner.

Honestly, I'm not sure that it's worth including this, considering the use
case...

Thanks,

Best regards,
Etsuro Fujita

> Ants Aasma
> --
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26
> A-2700 Wiener Neustadt
> Web: http://www.postgresql-support.de
> 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-28 Thread Tom Lane

Josh Berkus  writes:
> Well, I think it's "plausible but wrong under at least some common
> circumstances".  In addition to seeking, it ignores FS cache effects
> (not that I have any idea how to account for these mathematically).  It
> also makes the assumption that 3 autovacuum workers running at 1/3 speed
> each is better than having one worker running at full speed, which is
> debatable.

Well, no, not really, because the original implementation with only one
worker was pretty untenable.  But maybe we need some concept like only
one worker working on *big* tables?  Or at least, less than max_workers
of them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade log files

2012-06-28 Thread Alvaro Herrera

Right now, the pg_upgrade output files contain the output of the
commands it runs.  However, it does not contain the actual commands
being run ... so if you want to figure out what is happening, you have
to trace the pg_upgrade source while reading the log file.

This is, to say the least, suboptimal.

I propose this patch which echoes the commands to the respective log
files.  I would backpatch this to 9.2.

As a note, I find the exec_prog() function rather bizarre.  It would be
better if the caller did not have to manually specify a redirection, for
instance.

-- 
Álvaro Herrera 


0001-Make-the-pg_upgrade-log-files-contain-actual-command.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade log files

2012-06-28 Thread Tom Lane

Alvaro Herrera  writes:
> I propose this patch which echoes the commands to the respective log
> files.  I would backpatch this to 9.2.

OK, but the fflush just before fclose seems a bit pointless; fclose
will do that anyway no?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

94 matches

Mail list logo