Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Magnus Hagander
On Thu, May 10, 2012 at 6:28 PM, Peter Eisentraut pete...@gmx.net wrote:
 On tor, 2012-05-10 at 17:31 +0200, Magnus Hagander wrote:
 If people want the main docs building more often that's not really a
 problem other than time - we just need to decouple it from the
 buildfarm and run a separate job for it. It's not rocket science..

 Many years ago, Bruce and myself in particular put in a lot of work to
 make the turnaround time on the docs build less than 5 minutes, based on
 various requests.  I'm disappointed to learn that that was abandoned
 without discussion.  We might as well just put the old job back.

It was not abandoned without discussion in any way.

First of all, the docs still build in 5 minutes.

Second, the 5 minutes docs build link on the website was removed in
*2007*. At the request of Bruce, who maintained it. This request was
(at least according to the commit message and form what I can
remember) made in public on pgsql-www, and thus clearly open for
discussion. At http://archives.postgresql.org/pgsql-www/2007-12/msg00212.php.
Where Bruce claims the other one runs often enough, and nobody
objects.

Third, the regular docs build on the developer box (which I think ran
once / hour?) *did not work* (prior to that it kind of work but often
hung and failed, but at least it tried to run - at this point it
stopped even trying). The current docs build replaced the case when we
had *no developer docs updates at all*, by taking the reasonably quick
and easy fix to run it as part of an existing buildfarm animal and
upload the results.


So where in all this was anything abandoned?


Bruce already suggested putting the old job back on his box, which
he's of course free to do. But since it took almost 5 years before
anybody actually complained about that, maybe it's not really that big
a problem?

And it's already been agreed that increasing the dev docs build back
up to maybe once an hour would make sense.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Peter Eisentraut
On fre, 2012-05-11 at 09:26 +0200, Magnus Hagander wrote:
 On Thu, May 10, 2012 at 6:28 PM, Peter Eisentraut pete...@gmx.net wrote:
  On tor, 2012-05-10 at 17:31 +0200, Magnus Hagander wrote:
  If people want the main docs building more often that's not really a
  problem other than time - we just need to decouple it from the
  buildfarm and run a separate job for it. It's not rocket science..
 
  Many years ago, Bruce and myself in particular put in a lot of work to
  make the turnaround time on the docs build less than 5 minutes, based on
  various requests.  I'm disappointed to learn that that was abandoned
  without discussion.  We might as well just put the old job back.
 
 It was not abandoned without discussion in any way.
 
 First of all, the docs still build in 5 minutes.

That is different from the turnaround time from the commit.

 Second, the 5 minutes docs build link on the website was removed in
 *2007*. At the request of Bruce, who maintained it. This request was
 (at least according to the commit message and form what I can
 remember) made in public on pgsql-www, and thus clearly open for
 discussion. At http://archives.postgresql.org/pgsql-www/2007-12/msg00212.php.
 Where Bruce claims the other one runs often enough, and nobody
 objects.

You are misinterpreting this.  The reason Bruce's link was removed was
that the other (official) build was set to run at the same frequency, so
Bruce's build was exactly redundant.  The requirement/aspiration to have
a few minutes turnaround time continued.

 Third, the regular docs build on the developer box (which I think ran
 once / hour?) *did not work* (prior to that it kind of work but often
 hung and failed, but at least it tried to run - at this point it
 stopped even trying).

If you had any problems with how well they worked, we could have
discussed this.  It's fine if you want to change how they run, and I
have no problem with how they are run now, but I just want to make clear
what requirements led to the setup at the time.

 So where in all this was anything abandoned?

The ability to get a docs build in less than 5 minutes after commit.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python result set slicing broken in Python 3

2012-05-11 Thread Jan Urbański

On 10/05/12 19:45, Peter Eisentraut wrote:

On lör, 2012-05-05 at 22:45 +0200, Jan Urbański wrote:

Apparently once you implement PyMappingMethods.mp_subscript you can
drop PySequenceMethods.sq_slice, but I guess there's no harm in
keeping it (and I'm not sure it'd work on Python 2.3 with only
mp_subscript implemented).


Committed this now.

 From test coverage reports, I now see that PLy_result_ass_item() is no
longer called.  That's probably OK, if assignments are now handled
through the mapping methods.  But should we remove the function then?


Have you tried on Python 2.3 as well? People on #python said that if you 
implement the mapping functions, the sequence slicing functions are no 
longer used, but maybe we should revisit for the next release, rather 
than risk introducing a regression for the benefit of removing a few 
dead lines.


Cheers,
Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Magnus Hagander
On Fri, May 11, 2012 at 9:55 AM, Peter Eisentraut pete...@gmx.net wrote:
 On fre, 2012-05-11 at 09:26 +0200, Magnus Hagander wrote:
 On Thu, May 10, 2012 at 6:28 PM, Peter Eisentraut pete...@gmx.net wrote:
  On tor, 2012-05-10 at 17:31 +0200, Magnus Hagander wrote:
  If people want the main docs building more often that's not really a
  problem other than time - we just need to decouple it from the
  buildfarm and run a separate job for it. It's not rocket science..
 
  Many years ago, Bruce and myself in particular put in a lot of work to
  make the turnaround time on the docs build less than 5 minutes, based on
  various requests.  I'm disappointed to learn that that was abandoned
  without discussion.  We might as well just put the old job back.

 It was not abandoned without discussion in any way.

 First of all, the docs still build in 5 minutes.

 That is different from the turnaround time from the commit.

 Second, the 5 minutes docs build link on the website was removed in
 *2007*. At the request of Bruce, who maintained it. This request was
 (at least according to the commit message and form what I can
 remember) made in public on pgsql-www, and thus clearly open for
 discussion. At http://archives.postgresql.org/pgsql-www/2007-12/msg00212.php.
 Where Bruce claims the other one runs often enough, and nobody
 objects.

 You are misinterpreting this.  The reason Bruce's link was removed was
 that the other (official) build was set to run at the same frequency, so
 Bruce's build was exactly redundant.  The requirement/aspiration to have
 a few minutes turnaround time continued.

But the other (official) build was *not* set to run at the same
frequency. It was set, according to that mail, to run frequently
enough, but it did not run every 5 minutes. at least not the only
cronjob I found back then.


 Third, the regular docs build on the developer box (which I think ran
 once / hour?) *did not work* (prior to that it kind of work but often
 hung and failed, but at least it tried to run - at this point it
 stopped even trying).

 If you had any problems with how well they worked, we could have
 discussed this.  It's fine if you want to change how they run, and I
 have no problem with how they are run now, but I just want to make clear
 what requirements led to the setup at the time.

The entire machine they ran on *died*. Because it had been
unmaintained for many years. and parts was silently upgraded where as
other, incompatible, parts were not. We did actually leave the script
around. It ran for months, failing at step one, and pretty much nobody
complained.

The docs build was *entirely* undocumented (other than the official
cronjob which did *not* run every 5 minutes, but I guess you are
saying there was a second, undocumented, cronjob that ran more often).


But in the interest of actually being productive - what *is* the
usecase for needing a 5 minute turnaround time? I don't buy the check
what a patch looks like, because that should be done *before* the
commit, not after - so it's best verified by a local docs build anyway
(which will also be faster).

I'm sure we can put something in with a pretty quick turnaround again
without too much strain on the system, but it does, as I mentioned
before, require decoupling it from the buildfarm which means it's not
just tweaking a config file.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Simon Riggs
On 10 May 2012 20:51, Andres Freund and...@2ndquadrant.com wrote:

 I noticed that when synchronous_commit=off were not waking up the wal sender
 latch in xact.c:RecordTransactionCommit which leads to ugly delays of approx 7
 seconds (1 + replication_timeout/10) with default settings.
 Given that were flushing the wal to disk much sooner this appears to be a bad
 idea - especially as this may happen even under load if we ever reach the
 'coughtup' state.

Sounds like a problem. I'll have a look.

 I wonder why the WalSndWakeup isn't done like:

 diff --git a/src/backend/access/transam/xlog.c
 b/src/backend/access/transam/xlog.c
 index ecb71b6..7a3224b 100644
 --- a/src/backend/access/transam/xlog.c
 +++ b/src/backend/access/transam/xlog.c
 @@ -1906,6 +1906,10 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool
 xlog_switch)
            xlogctl-LogwrtRqst.Flush = LogwrtResult.Flush;
        SpinLockRelease(xlogctl-info_lck);
    }
 +
 +   /* the walsender wasn't woken up in xact.c */
 +   if(max_wal_senders  1  synchronous_commit == SYNCHRONOUS_COMMIT_OFF)
 +       WalSndWakeup();
  }

 Doing that for the synchronous_commit=off case can imo be considered a bugfix,
 but I wonder why we ever wake the senders somewhere else?
 The only argument I can see for doing it at places like StartTransactionCommit
 is that thats the place after which the data will be visible on the client. I
 think thats a non-argument though because if wal is flushed to disk outside of
 a commit there normally is enough data to make it worthwile.

 Doing the above results in a very noticeable reduction in lagginess and even a
 noticeable reduction in cpu-usage spikes on a busy replication test setup.

 Greetings,

 Andres

 --
  Andres Freund                     http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout just got a lot more common on Windows

2012-05-11 Thread Magnus Hagander
On Thu, May 10, 2012 at 6:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Hence I think we oughta swap the order of those two array
 elements.  (Same issue in PGSemaphoreLock, btw, and I'm suspicious of
 pgwin32_select.)

 Oh ... while hacking win32 PGSemaphoreLock I saw that it has a *seriously*
 nasty bug: it does not reset ImmediateInterruptOK before returning.
 How is it that Windows machines aren't falling over constantly?

Hmm. the commit you made to fix it says it changes how
ImmediateInterruptOK is handled, but there was not a single line of
code that actually changed that? Or am I misreading this completely?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer code behaving strangely on postmaster -T

2012-05-11 Thread Simon Riggs
On 10 May 2012 16:14, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of jue may 10 02:27:32 -0400 2012:
 Alvaro Herrera alvhe...@alvh.no-ip.org writes:
 I noticed while doing some tests that the checkpointer process does not
 recover very nicely after a backend crashes under postmaster -T

 It seems to me that the bug is in the postmaster state machine rather
 than checkpointer itself.  After a few false starts, this seems to fix
 it:

 --- a/src/backend/postmaster/postmaster.c
 +++ b/src/backend/postmaster/postmaster.c
 @@ -2136,6 +2136,8 @@ pmdie(SIGNAL_ARGS)
                     signal_child(WalWriterPID, SIGTERM);
                 if (BgWriterPID != 0)
                     signal_child(BgWriterPID, SIGTERM);
 +               if (FatalError  CheckpointerPID != 0)
 +                   signal_child(CheckpointerPID, SIGUSR2);

 Surely we do not want the checkpointer doing a shutdown checkpoint here.
 If we need it to die immediately, SIGQUIT is the way.  If we want a
 shutdown checkpoint, that has to wait till after everything else is
 known dead.  So while I agree this may be a state machine bug, that
 doesn't look like a good fix.

Is this now fixed? You've made a few changes so I'm confused. Thanks.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Cédric Villemain
Le jeudi 10 mai 2012 22:18:30, Alvaro Herrera a écrit :
 It's been said elsewhere that adding all this to the release notes as
 found on the official docs would be too bulky.  How about having a
 second copy of the release notes that contains authorship info as
 proposed by Andrew?  Then the docs could have no names at all, and
 credit would be given by some other page in the website (to which the
 release notes would link).

Maybe we can update/upgrade the page [1] on the website and add per release 
authors/rewviewers/companies/... ?

[1] http://www.postgresql.org/community/contributors/
-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] unite recovery.conf and postgresql.conf

2012-05-11 Thread Simon Riggs
On 10 May 2012 05:44, Josh Berkus j...@agliodbs.com wrote:

 I expect to revisit config directories before the first 9.3 CF, it will
 help multiple things I'd like to see happen.  Then we can circle back to
 the main unification job with a fairly clear path forward from there.

 Yeah, let's discuss this next week.  Easier configuration is one
 demand I'm hearing from developers in general, and I don't think that's
 nearly as hard a feature as, say, parallel query.  We can do it.

A key requirement is to be able to drop in new config files without
needing to $EDIT anything.

OK, its possible to put in lots of includeifexists for non-existent
files just in case you need one, but that sucks.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Florian Pflug
 Florian Pflug  wrote:
 On May10, 2012, at 18:36 , Kevin Grittner wrote:
 Robert Haas  wrote:
 
 I wonder if you could do this with something akin to the Bitmap
 Heap Scan machinery. Populate a TID bitmap with a bunch of
 randomly chosen TIDs, fetch them all in physical order
 and if you don't get as many rows as you need, rinse and repeat
 until you do.
 
 If you get too many, it is important that you read all the way to
 the end and then randomly omit some of them.  While a bit of a
 bother, that's pretty straightforward and should be pretty fast,
 assuming you're not, like, an order of magnitude high.
 
 Why is that? From a statistical point of view it shouldn't matter
 whether you pick N random samples, or pick M = N random samples an
 then randomly pick N from M. (random implying uniformly distributed
 here).
 
 That sounds to me like exactly what what Robert and I both said.
 While passing the heap with the bitmap, if you get to the number you
 want you don't stop there -- you read all of them (M in your
 parlance) and randomly drop M minus N of them.  Or, if you prefer, I
 guess you could *pick* N of them.  I don't see a logical difference.

What I meant to say was and drop the last M minus N, not and randomly
drop M minus N. Which, of course, makes my statement utterly wrong since
the tuples are sorted by TID so you'd penalize tuples with a larger TID.
Duh! Sorry for the noise, guys...

 But falling short is tougher; making up the difference could be an
 iterative process, which could always wind up with having you read
 all tuples in the table without filling your sample.
 
 But the likelihood of that happening is extremely low, no?
 
 That depends.  What if someone just did a mass delete and your
 statistics aren't yet up to date when they ask to pick a relatively
 large percentage of the rows.
 
 Unless the sampling percentage is very high
 
 Or the statistics are not current.  I agree, this shouldn't happen
 often, but we can never know, going in, whether it *is* the case.
 You *could* always wind up needing to read the entire table, and
 still not hit the initially-targeted number of rows.  Now, arguably
 you could use data gleaned from each pass to adjust the target or
 inform the size of the next pass.  My point is that we selected too
 few is a lot more complicated that the we selected too many case.
 
 but that case isn't of much practical importance anyway.
 
 It's important that it's handled in some sane way when it happens.
 And it will happen.

Hm. Maybe one can get rid of these sorts of problems by factoring in
the expected density of the table beforehand and simply accepting that
the results will be inaccurate if the statistics are outdated?

One could, for example, simply pick

  N := SamplingPercentage * MaxTuplesPerPage / AvgLiveTuplesPerPage

where

 AvgLiveTuplesPerPage := #Tuples / #Pages

random TIDs, fetch the live ones, and return them. I'm not totally sure
whether this approach is sensible to non-uniformity in the tuple to
line-pointer assignment, though.

 But something else comes to mind. Does the standard permit samples
 taken with the BERNOULLI method to contain the same tuple multiple
 times?
 
 I'm pretty sure not.  That would be nonsensical.
 
 If not, any kind of TID-based approach will have to all previously
 fetched TIDs, which seems doable but unfortunate...
 
 Right.  You would always need to ignore a duplicate random choice in
 any one cycle of generating ctid values; and if you are iterating
 because you fell short, you would also need to ignore values from
 previous iterations.  And OR your new bitmap against the previous
 ones to save for the next iteration, if needed.  I never said it
 couldn't be done; it's just very fussy, and you want to avoid a very
 large number of iterations in the case that someone deleted 99.99% of
 your rows right before you ran your sample and before autovacuum
 caught up.

Actually, thinking more about this, if the approach sketched above
turns out to work, then one might be able to get away without remembering
previously computed TIDs, thus removing a lot of complexity.

Say, for example, you simply start out with a single random TID tid[0].
The you produce the sequence of random TIDs by setting

  tid[n+1] = tid[n] + random(1 = x = 2*D-1)

where D is the expected distance between one TID from the sample set
and the next higher one, i.e. D = 2 * #TIDs / N. (You'd simply stop once
tid[n] ) #TIDs). This will give you approximately uniformly distributed
TIDs, I think, and will even return them in physical order. The 100$ question
is, by how much does this violate the independence requirement, i.e. how
far are P(tuple X picked) and P(tuple X picked | tuple Y picked) apart?
Some search through the literature should be able to answer that.

Should the dependency turn out to be too high, one might be able to
lower it by scaling up N by a factor q, and then discarding each generated
TID with probability 1/q. This amounts to a gradual 

Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Bruce Momjian
On Thu, May 10, 2012 at 08:46:56PM -0700, Robert Haas wrote:
 On May 10, 2012, at 4:19 PM, Andrew Dunstan and...@dunslane.net wrote:
  On 05/10/2012 06:15 PM, Tom Lane wrote:
  How about a hybrid: we continue to identify patch authors as now, that is 
  with names attached to the feature/bugfix descriptions, and then have a 
  separate section Other Contributors to recognize patch reviewers and 
  other helpers?
  
  works for me.
 
 Me, too.

That does not work for me.  There is no practical reason for a list of
names to appear in the release notes.  I suggest if we want to do that
that we remove all names from the release notes (as Tom suggested), and
create a wiki for credit, and link to that from the release
announcement.  That would allow us to put company names in there too.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can pg_trgm handle non-alphanumeric characters?

2012-05-11 Thread MauMau

From: Kevin Grittner kevin.gritt...@wicourts.gov

MauMau maumau...@gmail.com wrote:

For information, what kind of breakage would occur?



I imagined removing KEEPONLYALNUM would just accept
non-alphanumeric characters and cause no harm to those who use
only alphanumeric characters.


This would break our current usages because of the handling of
trigrams at the edges of groups of qualifying characters.  It
would make similarity (and distance) values less useful for our
current name searches using it.  To simulate the effect, I used an
'8' in place of a comma instead of recompiling with the suggested
change.

test=# select show_trgm('smith,john');
show_trgm
---
{  j,  s, jo, sm,hn ,ith,joh,mit,ohn,smi,th }
(1 row)

test=# select show_trgm('smith8john');
 show_trgm
-
{  s, sm,8jo,h8j,hn ,ith,joh,mit,ohn,smi,th8}
(1 row)

test=# select similarity('smith,john', 'jon smith');
similarity

  0.615385
(1 row)

test=# select similarity('smith8john', 'jon smith');
similarity

0.3125
(1 row)

So making the proposed change unconditionally could indeed hurt
current users of the technique.  On the other hand, if there was
fine-grained control of this, it might make trigrams useful for
searching statute cites (using all characters) as well as names
(using the current character set); so I wouldn't want it to just be
controlled by a global GUC.


Thank you for your concise examples. I probably got it.

From your examples, I thought KEEPONLYALNUM controls whether 
non-alphanumeric characters are included in trigrams, though I haven't read 
the code of pg_trgm. So, removing KEEPONLYALNUM definition produces trigrams 
unnecessary for users who handle only alphanumeric text. That would lead to 
undesirable query results.


Then, I wonder what would be the ideal specification...to add 
alphanumeric/non-alphanumeric boolean switch to similarity() function, add 
non-alphanumeric version of operators (ex. %* and -*) and non-alphanumeric 
version of operator classes (ex. gin_allchars_trgm_ops)? At least, I 
understood the fix is not appropriate for minor releases.


Regards
MauMau


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout just got a lot more common on Windows

2012-05-11 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Thu, May 10, 2012 at 6:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Oh ... while hacking win32 PGSemaphoreLock I saw that it has a *seriously*
 nasty bug: it does not reset ImmediateInterruptOK before returning.
 How is it that Windows machines aren't falling over constantly?

 Hmm. the commit you made to fix it says it changes how
 ImmediateInterruptOK is handled, but there was not a single line of
 code that actually changed that? Or am I misreading this completely?

Exit is now out the bottom of the loop, not by a raw return;.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Albe Laurenz
Michael Nolan wrote:
 I see one potential difference between your results and mine.
 
 When I rebuild the tablespace, I wind up with the same filename/OID as
 before, I'm not sure you do.

Right. That's strange.
Usually OIDs get incremented, so you shouldn't end up with the same
OID for the new tablespace.

Can you provide a complete testcase?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Andrew Dunstan



On 05/11/2012 08:56 AM, Bruce Momjian wrote:

On Thu, May 10, 2012 at 08:46:56PM -0700, Robert Haas wrote:

On May 10, 2012, at 4:19 PM, Andrew Dunstanand...@dunslane.net  wrote:

On 05/10/2012 06:15 PM, Tom Lane wrote:

How about a hybrid: we continue to identify patch authors as now, that is with names 
attached to the feature/bugfix descriptions, and then have a separate section Other 
Contributors to recognize patch reviewers and other helpers?

works for me.

Me, too.

That does not work for me.  There is no practical reason for a list of
names to appear in the release notes.  I suggest if we want to do that
that we remove all names from the release notes (as Tom suggested), and
create a wiki for credit, and link to that from the release
announcement.  That would allow us to put company names in there too.



I gave you a reason. You might not agree with it but saying that it's no 
reason doesn't make it so. A wiki page will just be duplication, IMNSHO.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Florian Pflug
[ Sorry for the self-reply ]

On May11, 2012, at 13:17 , Florian Pflug wrote:
 Actually, thinking more about this, if the approach sketched above
 turns out to work, then one might be able to get away without remembering
 previously computed TIDs, thus removing a lot of complexity.
 
 Say, for example, you simply start out with a single random TID tid[0].
 The you produce the sequence of random TIDs by setting
 
  tid[n+1] = tid[n] + random(1 = x = 2*D-1)
 
 where D is the expected distance between one TID from the sample set
 and the next higher one, i.e. D = 2 * #TIDs / N. (You'd simply stop once
 tid[n] ) #TIDs). This will give you approximately uniformly distributed
 TIDs, I think, and will even return them in physical order. The 100$ question
 is, by how much does this violate the independence requirement, i.e. how
 far are P(tuple X picked) and P(tuple X picked | tuple Y picked) apart?
 Some search through the literature should be able to answer that.

Actually, one can easily do better then that. Say you've determined that
to pick each tuple with probability p_tup, you need to pick each TID with
probability p_tid (where p_tup/p_tid is the TID density, i.e. the probability
of a single TID being live). Now, looping over all TIDs and picking each with
probability p_tid is, of course, just a another (more error-prone) way of
iterating over all tuples and picking each with probability p_tup. But instead
of looping, you can jump from from picked TID to the next. Observe that, after
having picked tid X, the probability of picking X+i as the next tid is
(1 - p_tid)^(i-1) * p_tid, because to pick X+i next you have to skip (i-1) TIDs
and then pick the i-th one. Thus, the following algorithm returns a sorted list
of TIDs which should be uniformly distributed and independent (or so I think,
at least)

  1) Starting with a random tid tid[0], chosen such that
   P(tid[0] = i) = (1 - p_tid)^i * p_tid

  2) Setting tid[n+1] = tid[n] + d, which d  0 chosen such that
   P(d = i) = (1 - p_tid)^(i-1) * p_tid

  3) Abort if max(tid) is = #TIDs.

This all hinges on the ability to produce a sufficient accurate estimate of the
TID density p_tup/p_tid, of course.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Sandro Santilli
On Thu, May 10, 2012 at 02:30:35PM -0400, Robert Haas wrote:
 On Thu, May 10, 2012 at 2:07 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
  Ants Aasma a...@cybertec.at wrote:
  It seems to me that the simplest thing to do would be to lift the
  sampling done in analyze.c (acquire_sample_rows) and use that to
  implement the SYSTEM sampling method.
 
  Definitely.  I thought we had all agreed on that ages ago.
 
 Right, and I don't think we should be considering any of this other
 stuff until that basic thing is implemented and working.

Agreed. That's what I'd love to see as well, for the GIS part.

--strk; 

  ,--o-. 
  |   __/  |Delivering high quality PostGIS 2.0 !
  |  / 2.0 |http://strk.keybit.net - http://vizzuality.com
  `-o--'


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Bruce Momjian
On Fri, May 11, 2012 at 09:51:49AM -0400, Andrew Dunstan wrote:
 
 
 On 05/11/2012 08:56 AM, Bruce Momjian wrote:
 On Thu, May 10, 2012 at 08:46:56PM -0700, Robert Haas wrote:
 On May 10, 2012, at 4:19 PM, Andrew Dunstanand...@dunslane.net  wrote:
 On 05/10/2012 06:15 PM, Tom Lane wrote:
 How about a hybrid: we continue to identify patch authors as now, that is 
 with names attached to the feature/bugfix descriptions, and then have a 
 separate section Other Contributors to recognize patch reviewers and 
 other helpers?
 works for me.
 Me, too.
 That does not work for me.  There is no practical reason for a list of
 names to appear in the release notes.  I suggest if we want to do that
 that we remove all names from the release notes (as Tom suggested), and
 create a wiki for credit, and link to that from the release
 announcement.  That would allow us to put company names in there too.
 
 
 I gave you a reason. You might not agree with it but saying that
 it's no reason doesn't make it so. A wiki page will just be
 duplication, IMNSHO.

I mean a reason from the reader/development-process perspective, not
from the perspective of giving a some benefit to contributors.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Kevin Grittner
Florian Pflug f...@phlo.org wrote:
 
 Maybe one can get rid of these sorts of problems by factoring in
 the expected density of the table beforehand and simply accepting
 that the results will be inaccurate if the statistics are
 outdated?
 
 One could, for example, simply pick
 
   N := SamplingPercentage * MaxTuplesPerPage /
AvgLiveTuplesPerPage
 
 where
 
  AvgLiveTuplesPerPage := #Tuples / #Pages
 
 random TIDs, fetch the live ones, and return them.
 
To clarify, I read this as using reltuples and relpages for the
table, and returning only tuples which are visible according to the
query's snapshot.  (i.e., I think you used live to mean two
different things there.)
 
Unless I'm missing something, I think that works for percentage
selection, which is what the standard talks about, without any need
to iterate through addition samples.  Good idea!  We don't need to
do any second pass to pare down initial results, either.  This
greatly simplifies coding while providing exactly what the standard
requires.
 
 I'm not totally sure whether this approach is sensible to
 non-uniformity in the tuple to line-pointer assignment, though.
 
I think we can solve that by going high enough with tuple numbers to
reach the highest tuple ID that might be in use in the table, and
*not* following HOT chains.  (If we follow HOT chains, we could have
several distinct ctid values which returned the same tuple.)  Or am
I thinking about this incorrectly?
 
 [more complex alternatives]
 
I really think your first suggestion covers it perfectly; these more
complex techniques don't seem necessary to me.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 This all hinges on the ability to produce a sufficient accurate estimate of 
 the
 TID density p_tup/p_tid, of course.

I think that's the least of its problems.  AFAICS this analysis ignores
(1) the problem that the TID space is nonuniform, ie we don't know how
many tuples in each page until we look;
(2) the problem that we don't know the overall number of tuples
beforehand.

I'm not sure that there is any way to deal with (1) fully without
examining every single page, but algorithms that assume that the TIDs
are numbered linearly are broken before they start.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Bruce Momjian
On Fri, May 11, 2012 at 10:01:32AM -0400, Bruce Momjian wrote:
 On Fri, May 11, 2012 at 09:51:49AM -0400, Andrew Dunstan wrote:
  
  
  On 05/11/2012 08:56 AM, Bruce Momjian wrote:
  On Thu, May 10, 2012 at 08:46:56PM -0700, Robert Haas wrote:
  On May 10, 2012, at 4:19 PM, Andrew Dunstanand...@dunslane.net  wrote:
  On 05/10/2012 06:15 PM, Tom Lane wrote:
  How about a hybrid: we continue to identify patch authors as now, that 
  is with names attached to the feature/bugfix descriptions, and then 
  have a separate section Other Contributors to recognize patch 
  reviewers and other helpers?
  works for me.
  Me, too.
  That does not work for me.  There is no practical reason for a list of
  names to appear in the release notes.  I suggest if we want to do that
  that we remove all names from the release notes (as Tom suggested), and
  create a wiki for credit, and link to that from the release
  announcement.  That would allow us to put company names in there too.
  
  
  I gave you a reason. You might not agree with it but saying that
  it's no reason doesn't make it so. A wiki page will just be
  duplication, IMNSHO.
 
 I mean a reason from the reader/development-process perspective, not
 from the perspective of giving a some benefit to contributors.

Let me add that I am concerned about the lack of objectivity in many of
the suggestions in this thread.  This has prompted me to think that the
temptation of having names on these release note items is just too
great, and that the names should be removed.

Let me put it this way --- the release notes are read by thousands of
people.  The benefit individuals gather from their names in the release
notes is a small part of the overall value provided by the release notes
to users.  There was a practical need to have names on items in the past
--- that need is no longer present.

I predict that if we twist the release notes to have PR value for
contributors, it will become a prepetual problem and will diminish the
cohesiveness of our group.  I am already personally upset by a few of
the things I have seen on this thread.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Michael Nolan
On 5/11/12, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 Michael Nolan wrote:
 I see one potential difference between your results and mine.

 When I rebuild the tablespace, I wind up with the same filename/OID as
 before, I'm not sure you do.

 Right. That's strange.
 Usually OIDs get incremented, so you shouldn't end up with the same
 OID for the new tablespace.

 Can you provide a complete testcase?

I thought I had, until you were unable to reproduce it. :-)
--
Mike Nolan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Michael Nolan
On 5/11/12, Michael Nolan htf...@gmail.com wrote:
 On 5/11/12, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 Michael Nolan wrote:
 I see one potential difference between your results and mine.

 When I rebuild the tablespace, I wind up with the same filename/OID as
 before, I'm not sure you do.

 Right. That's strange.
 Usually OIDs get incremented, so you shouldn't end up with the same
 OID for the new tablespace.

 Can you provide a complete testcase?

 I thought I had, until you were unable to reproduce it. :-)
 --
 Mike Nolan


My plan at this point is to wait until beta 1 of 9.2 is out, then see
if I can reproduce
the problem there.
--
Mike Nolan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Florian Pflug f...@phlo.org wrote:
 Maybe one can get rid of these sorts of problems by factoring in
 the expected density of the table beforehand and simply accepting
 that the results will be inaccurate if the statistics are
 outdated?
 
 Unless I'm missing something, I think that works for percentage
 selection, which is what the standard talks about, without any need
 to iterate through addition samples.  Good idea!  We don't need to
 do any second pass to pare down initial results, either.  This
 greatly simplifies coding while providing exactly what the standard
 requires.
 
 I'm not totally sure whether this approach is sensible to
 non-uniformity in the tuple to line-pointer assignment, though.

If you're willing to accept that the quality of the results depends on
having up-to-date stats, then I'd suggest (1) use the planner's existing
technology to estimate the number of rows in the table; (2) multiply
by sampling factor you want to get a desired number of sample rows;
(3) use ANALYZE's existing technology to acquire that many sample rows.
While the ANALYZE code isn't perfect with respect to the problem of
nonuniform TID density, it certainly will be a lot better than
pretending that that problem doesn't exist.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Andrew Dunstan



On 05/11/2012 10:15 AM, Bruce Momjian wrote:

On Fri, May 11, 2012 at 10:01:32AM -0400, Bruce Momjian wrote:

On Fri, May 11, 2012 at 09:51:49AM -0400, Andrew Dunstan wrote:


On 05/11/2012 08:56 AM, Bruce Momjian wrote:

On Thu, May 10, 2012 at 08:46:56PM -0700, Robert Haas wrote:

On May 10, 2012, at 4:19 PM, Andrew Dunstanand...@dunslane.net   wrote:

On 05/10/2012 06:15 PM, Tom Lane wrote:

How about a hybrid: we continue to identify patch authors as now, that is with names 
attached to the feature/bugfix descriptions, and then have a separate section Other 
Contributors to recognize patch reviewers and other helpers?

works for me.

Me, too.

That does not work for me.  There is no practical reason for a list of
names to appear in the release notes.  I suggest if we want to do that
that we remove all names from the release notes (as Tom suggested), and
create a wiki for credit, and link to that from the release
announcement.  That would allow us to put company names in there too.


I gave you a reason. You might not agree with it but saying that
it's no reason doesn't make it so. A wiki page will just be
duplication, IMNSHO.

I mean a reason from the reader/development-process perspective, not
from the perspective of giving a some benefit to contributors.

Let me add that I am concerned about the lack of objectivity in many of
the suggestions in this thread.  This has prompted me to think that the
temptation of having names on these release note items is just too
great, and that the names should be removed.

Let me put it this way --- the release notes are read by thousands of
people.  The benefit individuals gather from their names in the release
notes is a small part of the overall value provided by the release notes
to users.  There was a practical need to have names on items in the past
--- that need is no longer present.

I predict that if we twist the release notes to have PR value for
contributors, it will become a prepetual problem and will diminish the
cohesiveness of our group.  I am already personally upset by a few of
the things I have seen on this thread.



Well, I don't know what has changed that made it imperative in the past 
to have the names and makes it now redundant, nor what could possibly 
have upset you so much. Maybe I'm dense, but that's the truth of it.


Now if someone is going to volunteer to build *AND* *MAINTAIN* a Credits 
page, that will be good. It would be even better if they would go back 
and do it historically. But just hoping that will happen and meantime 
removing the names from the notes seems to me a retrograde step.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Let me add that I am concerned about the lack of objectivity in many of
 the suggestions in this thread.  This has prompted me to think that the
 temptation of having names on these release note items is just too
 great, and that the names should be removed.

Er, what?  I have not seen anything in this thread that merits such
an accusation.

 Let me put it this way --- the release notes are read by thousands of
 people.  The benefit individuals gather from their names in the release
 notes is a small part of the overall value provided by the release notes
 to users.  There was a practical need to have names on items in the past
 --- that need is no longer present.

My recollection is that we have been putting the names on the items to
(a) give credit where credit is due, and (b) show that Postgres has a
large and growing development community.  I had never heard the argument
remember whom to blame for a broken feature until you raised it
yesterday --- personally, I've always looked in the commit logs if I want
to know something like that.  So I don't see that there's really any
change in the terms of discussion.  It may be that the release notes
aren't the best place for doing either (a) or (b), but I don't agree
that we simply don't need to worry about either anymore.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog

2012-05-11 Thread Magnus Hagander
On Thursday, May 10, 2012, Fujii Masao wrote:

 On Thu, May 10, 2012 at 11:51 PM, Magnus Hagander 
 mag...@hagander.netjavascript:;
 wrote:
  And taking this a step further - we *already* send these GUCs.
  Previous references to us not doing that were incorrect :-)
 
  So this should be a much easier fix than we thought. And can be done
  entirely in pg_basebackup, meaning we don't need to worry about
  beta...

 Sounds good!


Should we go down the easy way and just reject connections when the flag is
mismatching between the client and the server (trivial to do - see the
attached patch)? Or should we try to implement both floating point and
integer in pg_basebackup, and make it work in either case? (We'd still have
to reject it if pg_basebackup was compiled without support for int64 at
all, of course, but the large majority of cases will have integer
timestamps these days, but could be made to support both integer and float
for the trivial operations that pg_basebackup actually does).

How common *is* it to have a build that doesn't have integer timestamps
these days? Does any of the binary builds do that at all, for example? If
it's uncommon enough, I think we should just go with the easy way out...

//Magnus



-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index b12932f..3a3217d 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -75,6 +75,7 @@ GetConnection(void)
 	const char **keywords;
 	const char **values;
 	char	   *password = NULL;
+	char	   *tmpparam;
 
 	if (dbhost)
 		argcount++;
@@ -157,6 +158,29 @@ GetConnection(void)
 		free(values);
 		free(keywords);
 
+		/*
+		 * Ensure we have the same value of integer timestamps as the
+		 * server we are connecting to.
+		 */
+		tmpparam = PQparameterStatus(tmpconn, integer_datetimes);
+		if (!tmpparam)
+		{
+			fprintf(stderr, _(%s: could not determine server setting for integer_datetimes\n),
+	progname);
+			exit(1);
+		}
+
+#ifdef HAVE_INT64_TIMESTAMP
+		if (strcmp(tmpparam, on) != 0)
+#else
+		if (strcmp(tmpparam, off) != 0)
+#endif
+		{
+			fprintf(stderr, _(%s: integer_datetimes compile flag mismatch with server\n),
+	progname);
+			exit(1);
+		}
+
 		/* Store the password for next run */
 		if (password)
 			dbpassword = password;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 If you're willing to accept that the quality of the results
 depends on having up-to-date stats, then I'd suggest (1) use the
 planner's existing technology to estimate the number of rows in
 the table; (2) multiply by sampling factor you want to get a
 desired number of sample rows; (3) use ANALYZE's existing
 technology to acquire that many sample rows.  While the ANALYZE
 code isn't perfect with respect to the problem of nonuniform TID
 density, it certainly will be a lot better than pretending that
 that problem doesn't exist.
 
That is basically what we've been talking about for SYSTEM samples,
which I think we've agreed should be finished and committed before
programming on the BERNOULLI option.  Nobody is pretending that the
problem doesn't exist.  Let me restate the option I'm thinking
solves the problem well, since it would be easy to have missed what
I was responding to with all the various suggestions on the thread.
 
The user has asked for a BERNOULLI sample S of some percent of the
table.  We determine the maximum tuple index M which could be used
for any ctid in the table.  We get an estimate of the number of
pages P in the table using whatever is the best available technique.
If you assume that the number of tuples (visible or not) is T, the
approximate number of tuples we want to randomly select is S*T.  We
will be probing random page numbers 1..P for random tuple indexes
1..M.  So how many random probes by ctid does that require? The
chance of a hit on each probe is ((T/P)/M) -- the average number of
tuples per page divided by the number of tuple index values allowed.
So we need (S*T)/((T/P)/M) probes.  Simplifying that winds up being
S*M*P the product of the sample size as a percentage, the maximum
tuple index on a page, and the number of pages.  (A calculation some
may have jumped to as intuitively obvious.)
 
So let's call the number of probes N.  We randomly generate N
distinct ctid values, where each is a random page number 1..P and a
random index 1..M.  We attempt to read each of these in block number
order, not following HOT chains.  For each, if the tuple exists and
is visible, it is part of our result set.
 
Since T cancels out of that equation, we don't need to worry about
estimating it.  Results will be correct for any value of M which is
*at least* as large as the maximum tuple index in the table,
although values of M larger than that will degrade performance.  The
same holds true for the number of pages.
 
Shouldn't that get us the randomly chosen sample we're looking for? 
Is there a problem you think this ignores?
 
Credit where credit is due: I *think* this is merely my restatement
of something Florian was suggesting, building on a suggestion from
Robert.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Robert Haas
On Fri, May 11, 2012 at 11:04 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 We
 will be probing random page numbers 1..P for random tuple indexes
 1..M.  So how many random probes by ctid does that require? The
 chance of a hit on each probe is ((T/P)/M) -- the average number of
 tuples per page divided by the number of tuple index values allowed.
 So we need (S*T)/((T/P)/M) probes.  Simplifying that winds up being
 S*M*P the product of the sample size as a percentage, the maximum
 tuple index on a page, and the number of pages.  (A calculation some
 may have jumped to as intuitively obvious.)

 So let's call the number of probes N.  We randomly generate N
 distinct ctid values, where each is a random page number 1..P and a
 random index 1..M.  We attempt to read each of these in block number
 order, not following HOT chains.  For each, if the tuple exists and
 is visible, it is part of our result set.

 Since T cancels out of that equation, we don't need to worry about
 estimating it.  Results will be correct for any value of M which is
 *at least* as large as the maximum tuple index in the table,
 although values of M larger than that will degrade performance.  The
 same holds true for the number of pages.

The trouble is, AFAICS, that you can't bound M very well without
scanning the whole table.  I mean, it's bounded by theoretical limit,
but that's it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 [ uniformly sample the TID space defined as (1..P, 1..M) ]

 Shouldn't that get us the randomly chosen sample we're looking for? 
 Is there a problem you think this ignores?

Not sure.  The issue that I'm wondering about is that the line number
part of the space is not uniformly populated, ie, small line numbers
are much more likely to exist than large ones.  (In the limit that
density goes to zero, when you pick M much too large.)  It's not clear
to me whether this gives an unbiased probability of picking real tuples,
as opposed to hypothetical TIDs.

Another issue is efficiency.  In practical cases you'll have to greatly
overestimate M compared to the typical actual-number-of-tuples-per-page,
which will lead to a number of target TIDs N that's much larger than
necessary, which will make the scan slow --- I think in practice you'll
end up doing a seqscan or something that might as well be one, because
unless S is *really* tiny it'll hit just about every page.  We can have
that today without months worth of development effort, using the WHERE
random()  S technique.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 The trouble is, AFAICS, that you can't bound M very well without
 scanning the whole table.  I mean, it's bounded by theoretical
 limit, but that's it.
 
What would the theoretical limit be?  (black size - page header size
- minimum size of one tuple) / item pointer size?  So, on an 8KB
page, somewhere in the neighborhood of 1350?  Hmm.  If that's right,
that would mean a 1% random sample would need 13.5 probes per page,
meaning there wouldn't tend to be a lot of pages missed.  Still, the
technique for getting a random sample seems sound, unless someone
suggests something better.  Maybe we just want to go straight to a
seqscan to get to the pages we want to probe rather than reading
just the ones on the probe list in physical order?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Robert Haas robertmh...@gmail.com wrote:
 The trouble is, AFAICS, that you can't bound M very well without
 scanning the whole table.  I mean, it's bounded by theoretical
 limit, but that's it.
 
 What would the theoretical limit be?  (black size - page header size
 - minimum size of one tuple) / item pointer size?  So, on an 8KB
 page, somewhere in the neighborhood of 1350?

Your math is off --- I get something less than 300, even if the tuples
are assumed to be empty of data.  (Header size 24 bytes, plus 4-byte
line pointer, so at least 28 bytes per tuple, so at most 292 on an 8K
page.)  But you still end up probing just about every page for a 1%
sample.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog

2012-05-11 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 How common *is* it to have a build that doesn't have integer timestamps
 these days? Does any of the binary builds do that at all, for example? If
 it's uncommon enough, I think we should just go with the easy way out...

+1 for just rejecting a mismatch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Andrew Dunstan



On 05/11/2012 05:32 AM, Magnus Hagander wrote:


But in the interest of actually being productive - what *is* the
usecase for needing a 5 minute turnaround time? I don't buy the check
what a patch looks like, because that should be done *before* the
commit, not after - so it's best verified by a local docs build anyway
(which will also be faster).

I'm sure we can put something in with a pretty quick turnaround again
without too much strain on the system, but it does, as I mentioned
before, require decoupling it from the buildfarm which means it's not
just tweaking a config file.


If it's of any use to you I have made some adjustments to the buildfarm 
code which would let you do *just* the docs build (and dist make if you 
want). It would still pull from git, and only do anything if there's a 
(relevant) change. So using that to set up a machine that would run 
every few minutes might work. Of course, building the docs can itself be 
fairly compute intensive, so you still might not want to run every few 
minutes if that's a limiting factor.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Florian Pflug
On May11, 2012, at 17:20 , Robert Haas wrote:
 On Fri, May 11, 2012 at 11:04 AM, Kevin Grittner
 Since T cancels out of that equation, we don't need to worry about
 estimating it.  Results will be correct for any value of M which is
 *at least* as large as the maximum tuple index in the table,
 although values of M larger than that will degrade performance.  The
 same holds true for the number of pages.
 
 The trouble is, AFAICS, that you can't bound M very well without
 scanning the whole table.  I mean, it's bounded by theoretical limit,
 but that's it.

Hm, but maybe Kevin's observation that the actual value of M shouldn't
matter as long as it's large enough helps here. What if you start out
with M=1 and generate your first TID. After reading the page, but before
returning a tuple, you check if M is indeed an upper bound on the tuple
indices. If it isn't, you increase M, recompute N (the number of probes),
determine a new random tuple index, and return the tuple (if it is live).

Would that introduce bias? I'd think not, because scaling up N shouldn't,
per Kevin's argument, change the probability of a TID being picked. So
increasing N in the middle of a scan should neither penalize nor favour
tuples which were already returned compared to those which will follow,
no?

(And yes, even if there don't turn out to be any obvious holes in this
argument, it requires more formal proof that I was able to give here
before being turned into code. Or at the very least excessive testing
which all kinds of data)

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 [ uniformly sample the TID space defined as (1..P, 1..M) ]
 
 Shouldn't that get us the randomly chosen sample we're looking
 for?  Is there a problem you think this ignores?
 
 Not sure.  The issue that I'm wondering about is that the line
 number part of the space is not uniformly populated, ie, small
 line numbers are much more likely to exist than large ones.  (In
 the limit that density goes to zero, when you pick M much too
 large.)  It's not clear to me whether this gives an unbiased
 probability of picking real tuples, as opposed to hypothetical
 TIDs.
 
I'm convinced it generates a random sample, since every tuple in the
*possible* space has an equal chance of selection, and we ignore the
ones which don't exist or aren't visible, each of the remaining
(existing, visible) tuples is as likely to be chosen as any other. 
I'm that's not a formal proof, but I'm sure I could work one up.
 
 Another issue is efficiency.  In practical cases you'll have to
 greatly overestimate M compared to the typical actual-number-of-
 tuples-per-page, which will lead to a number of target TIDs N
 that's much larger than necessary, which will make the scan slow
 --- I think in practice you'll end up doing a seqscan or something
 that might as well be one, because unless S is *really* tiny it'll
 hit just about every page.  We can have that today without months
 worth of development effort, using the WHERE random()  S
 technique.
 
I think you've probably got me there.  The technique you describe
should be equally random, and  thinking through the work involved in
each, the random()  S approach seems almost certain to be cheaper. 
Is it worth supporting the TABLESAMPLE BERNOULLI option as syntactic
sugar for that?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can pg_trgm handle non-alphanumeric characters?

2012-05-11 Thread Fujii Masao
On Fri, May 11, 2012 at 4:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 On Fri, May 11, 2012 at 12:07 AM, MauMau maumau...@gmail.com wrote:
 Thanks for your explanation. Although I haven't understood it well yet, I'll
 consider what you taught. And I'll consider if the tentative measure of
 removing KEEPONLYALNUM is correct for someone who wants to use pg_trgm
 against Japanese text.

 In Japanese, it's common to do a text search with two characters keyword.
 But since pg_trgm is 3-gram, you basically would not be able to use index
 for such text search. So you might need something like pg_bigm or pg_unigm
 for Japanese text search.

Even if an index can be used for two characters text search, bitmap index scan
picks up all rows, so it's too slow.

 I believe the trigrams are three *bytes* not three characters.  So a
 couple of kanji should work just fine for this.

Really? As far as I read the code of pg_trgm, the trigram is three characters
and its CRC32 is used as an index key if its size is more than three bytes.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Florian Pflug
On May11, 2012, at 16:03 , Kevin Grittner wrote:
 [more complex alternatives]
 
 I really think your first suggestion covers it perfectly; these more
 complex techniques don't seem necessary to me.

The point of the more complex techniques (especially the algorithm in
my second mail, the reply to self) was simply to optimize the generation
of a random, uniformly distributed, unique and sorted list of TIDs.

The basic idea is to make sure we generate the TIDs in physical order,
and thus automatically ensure that they are unique. The reduces the memory
(or disk) requirement to O(1) instead of O(n), and (more importantly,
actually) makes the actual implementation much simpler.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog

2012-05-11 Thread Fujii Masao
On Sat, May 12, 2012 at 12:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 How common *is* it to have a build that doesn't have integer timestamps
 these days? Does any of the binary builds do that at all, for example? If
 it's uncommon enough, I think we should just go with the easy way out...

 +1 for just rejecting a mismatch.

Agreed.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-11 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 Ensure age() returns a stable value rather than the latest value

Hm.  This fixes the stability-within-transaction problem, but it's also
introduced a change in the definition of age(), no?  Previously, in an
xact that had an XID, the age was measured relative to that XID.
I'm not sure that we should lightly abandon that correspondence.
At the very least we would need to update the user-facing documentation,
not only the function's header comment.  So far as I can find, the only
such documentation is the pg_description entry:
DESCR(age of a transaction ID, in transactions before current transaction);
but that's still wrong now.

The definition I was thinking of was if xact has an XID use that, else
do ReadNewTransactionId, and in either case save the value for later
calls during the current virtual xact.  This is more complicated than
what you implemented, and maybe we shouldn't be quite that tense about
backwards-compatibility.  But I don't think we should be changing the
function's definition like you've done in back branches.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-11 Thread David E. Wheeler
On May 10, 2012, at 6:18 PM, Tom Lane wrote:

 I also tried this on a Fedora 16 box, which has
 
 $ perl -v
 This is perl 5, version 14, subversion 2 (v5.14.2) built for 
 x86_64-linux-thread-multi
 
 Works fine there too...

Hrm…I've also just replicated it on CentOS 6.2 with Perl 5.10.1:

reputation=# begin;
BEGIN
Time: 22.330 ms
reputation=# do language plperlu $$ elog(ERROR, 'foo')$$;
ERROR:  foo at line 1.
CONTEXT:  PL/Perl anonymous code block
reputation=# select true;
 bool 
--
 t
(1 row)

Time: 18.392 ms
reputation=# rollback;
ROLLBACK
Time: 19.854 ms
reputation=# do language plperlu $$elog(NOTICE, $^V)$$;
NOTICE:  v5.10.1
CONTEXT:  PL/Perl anonymous code block
DO
Time: 20.459 ms
reputation=# select version();
   version  
 
--
 PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 
20110731 (Red Hat 4.4.6-3), 64-bit
(1 row)

Perhaps there is something funky in my configuration, though I tried a few 
different things and couldn't get it to change. I can show you on my laptop 
next week if you want to see it in person. Not sure if that will help…

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-11 Thread Tom Lane
David E. Wheeler david.whee...@iovation.com writes:
 Hrm…I've also just replicated it on CentOS 6.2 with Perl 5.10.1:

Interesting.

 Perhaps there is something funky in my configuration, though I tried a
 few different things and couldn't get it to change.

Yeah.  If Bruce and I don't see it on a couple of configurations apiece,
and you do see it on two different machines, personal configuration
choices start to sound like a likely contributing factor.  Could be
either PG configuration or Perl configuration (if there is any such
thing beyond installation-time decisions).

As for me, I was testing assert-enabled builds that were otherwise
entirely out-of-the-box, and I don't even know how to configure Perl.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-11 Thread Simon Riggs
On 11 May 2012 17:13, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 Ensure age() returns a stable value rather than the latest value

 Hm.  This fixes the stability-within-transaction problem, but it's also
 introduced a change in the definition of age(), no?  Previously, in an
 xact that had an XID, the age was measured relative to that XID.
 I'm not sure that we should lightly abandon that correspondence.
 At the very least we would need to update the user-facing documentation,
 not only the function's header comment.  So far as I can find, the only
 such documentation is the pg_description entry:
 DESCR(age of a transaction ID, in transactions before current transaction);
 but that's still wrong now.

 The definition I was thinking of was if xact has an XID use that, else
 do ReadNewTransactionId, and in either case save the value for later
 calls during the current virtual xact.  This is more complicated than
 what you implemented, and maybe we shouldn't be quite that tense about
 backwards-compatibility.  But I don't think we should be changing the
 function's definition like you've done in back branches.

Yeh, I thought about that.

The likely difference between the old and the new result is likely to
be small, especially in the main intended use case. The previous
definition was fairly weird, since if you executed it in a long
running transaction it would give a false reading of the actual age,
which ISTM was a bug in itself.

What would be more confusing would be to have age() return a different
result on standby and master.

At present the back branches just throw ERROR, so some change is
needed there at least, given our earlier policy of keeping that ERROR
as a backstop rather than as an acceptable return (re: SQLcode
discussions).

I've no objection to further change, but I think I've done the best
thing out of the various options.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-11 Thread David E. Wheeler
On May 11, 2012, at 9:39 AM, Tom Lane wrote:

 Hrm∑I've also just replicated it on CentOS 6.2 with Perl 5.10.1:
 
 Interesting.

Ah, it’s a psql configuration issue. I had replicated it on that box by 
connecting with psql on my Mac. When I SSHed to the box and used the psql 
there, I was *not* able to replicate it. 

So then I tried psql from my Mac again, this time with -X, and the transaction 
was properly cancelled. Some more fiddling, and I find that this is the culprit:

\set ON_ERROR_ROLLBACK interactive

So I guess it transparently rolls back a savepoint for the previous statement. 
I had forgotten I turned that on. Apologies for the noise.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Robert Haas
On Fri, May 11, 2012 at 11:36 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:
 The trouble is, AFAICS, that you can't bound M very well without
 scanning the whole table.  I mean, it's bounded by theoretical
 limit, but that's it.

 What would the theoretical limit be?

MaxHeapTuplesPerPage?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unite recovery.conf and postgresql.conf

2012-05-11 Thread Josh Berkus

 A key requirement is to be able to drop in new config files without
 needing to $EDIT anything.

Yes, absolutely.  I want to move towards the idea that the majority of
our users never edit postgresql.conf by hand.

 OK, its possible to put in lots of includeifexists for non-existent
 files just in case you need one, but that sucks.

Yeah, seems like we need something more elegant.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Fujii Masao
On Fri, May 11, 2012 at 4:51 AM, Andres Freund and...@2ndquadrant.com wrote:
 diff --git a/src/backend/access/transam/xlog.c
 b/src/backend/access/transam/xlog.c
 index ecb71b6..7a3224b 100644
 --- a/src/backend/access/transam/xlog.c
 +++ b/src/backend/access/transam/xlog.c
 @@ -1906,6 +1906,10 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool
 xlog_switch)
            xlogctl-LogwrtRqst.Flush = LogwrtResult.Flush;
        SpinLockRelease(xlogctl-info_lck);
    }
 +
 +   /* the walsender wasn't woken up in xact.c */
 +   if(max_wal_senders  1  synchronous_commit == SYNCHRONOUS_COMMIT_OFF)
 +       WalSndWakeup();
  }

Calling WalSndWakeup() while WALWriteLock is being held might cause another
performance degradation. No?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-11 Thread David E. Wheeler
On May 11, 2012, at 9:51 AM, David E. Wheeler wrote:

\set ON_ERROR_ROLLBACK interactive
 
 So I guess it transparently rolls back a savepoint for the previous 
 statement. I had forgotten I turned that on. Apologies for the noise.

OTOH, might it be useful to have psql show some sort of status message when it 
does this?

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 On Fri, May 11, 2012 at 11:36 AM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:
 The trouble is, AFAICS, that you can't bound M very well without
 scanning the whole table.  I mean, it's bounded by theoretical
 limit, but that's it.

 What would the theoretical limit be?
 
 MaxHeapTuplesPerPage?
 
What about dead line pointers without corresponding tuples?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-11 Thread Tom Lane
David E. Wheeler da...@justatheory.com writes:
 On May 11, 2012, at 9:51 AM, David E. Wheeler wrote:
 \set ON_ERROR_ROLLBACK interactive
 
 So I guess it transparently rolls back a savepoint for the previous 
 statement. I had forgotten I turned that on. Apologies for the noise.

Ah-hah.

 OTOH, might it be useful to have psql show some sort of status message when 
 it does this?

The same occurred to me, but after more thought I think it'd just be
horribly annoying after a short time.  People who have this set will
(or should, anyway) be expecting errors to be nonfatal to their
transactions.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Robert Haas
On Fri, May 11, 2012 at 1:09 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Calling WalSndWakeup() while WALWriteLock is being held might cause another
 performance degradation. No?

That definitely doesn't seem ideal - a lot of things can pile up
behind WALWriteLock.  I'm not sure how big a problem it would be in
practice, but we generally make a practice of avoiding sending signals
while holding LWLocks whenever possible...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-11 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On 11 May 2012 17:13, Tom Lane t...@sss.pgh.pa.us wrote:
 Hm.  This fixes the stability-within-transaction problem, but it's also
 introduced a change in the definition of age(), no?

 Yeh, I thought about that.

 The likely difference between the old and the new result is likely to
 be small, especially in the main intended use case. The previous
 definition was fairly weird, since if you executed it in a long
 running transaction it would give a false reading of the actual age,
 which ISTM was a bug in itself.

Well, this definition could also give unexpected readings, depending on
when the first call to age() occurred within the long-running xact.
I'm not sure there's any fix for that unless you want to redefine age()
as a volatile function that uses ReadNewTransactionId on *each* call;
and that doesn't sound like a more useful definition to me.

 What would be more confusing would be to have age() return a different
 result on standby and master.

But surely you can get that anyway, since ReadNewTransactionId is
unlikely to give the same results on standby and master.

 At present the back branches just throw ERROR, so some change is
 needed there at least, given our earlier policy of keeping that ERROR
 as a backstop rather than as an acceptable return (re: SQLcode
 discussions).

 I've no objection to further change, but I think I've done the best
 thing out of the various options.

I'm not convinced this is the best thing, and I'm definitely not happy
with changing the behavior of working cases (ie, behavior on the master)
in the back branches.

Anybody else have an opinion on this?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-11 Thread Robert Haas
On Fri, May 11, 2012 at 1:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not convinced this is the best thing, and I'm definitely not happy
 with changing the behavior of working cases (ie, behavior on the master)
 in the back branches.

 Anybody else have an opinion on this?

I agree with you.  Essentially, if we want age() to be stable, and
nobody's argued against that, we have to fix a point in the XID space
and do all of our computations relative to that point.  The original
coding did that be using our XID, and I think what we ought to do is
use either (1) our XID or (2) the next XID as of the first point in
time at which age() is called, if we don't have an XID yet then.
That's a slight behavior change even when not in HS mode, because a
read-only transaction won't acquire an XID just by virtue of using
age(), so I would probably have chosen to fix this only in master and
to not back-patch anything, but in practice I think the downside of
that behavior change is very minimal (and it might even be an equally
minimal improvement for some people) so I think a back-patch is fine.
However, I don't really see any particular merit in removing our own
XID from the picture entirely: that changes the behavior more
significantly for no particular benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Greg Stark
On Fri, May 11, 2012 at 6:16 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 MaxHeapTuplesPerPage?

 What about dead line pointers without corresponding tuples?

Actually we don't allow there to be more than MaxHeapTuplesPerPage
line pointers even if some of them are dead line pointers.

I think the argument then was that there could be bugs in code that
isn't expecting more so perhaps that doesn't justify baking that into
more places when we could instead be removing that assumption.

Also, if I absorbed enough of this conversation skimming backwards it
seems the algorithm would be very inefficient with such a conservative
estimate. On a table with normal width rows it would mean usually
picking tuples that don't exist and having to try again. As Tom
pointed out that would mean even a small sample would have to read
nearly the whole table to find the sample.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Andres Freund
On Friday, May 11, 2012 07:20:26 PM Robert Haas wrote:
 On Fri, May 11, 2012 at 1:09 PM, Fujii Masao masao.fu...@gmail.com wrote:
  Calling WalSndWakeup() while WALWriteLock is being held might cause
  another performance degradation. No?
 
 That definitely doesn't seem ideal - a lot of things can pile up
 behind WALWriteLock.  I'm not sure how big a problem it would be in
 practice, but we generally make a practice of avoiding sending signals
 while holding LWLocks whenever possible...
In my measurements on moderately powerful hardware I couldn't see any 
degradation on the primary - in fact the contrary, but the improvements were 
around 0.4% and I only tested 10min so its not exactly hard evidence. But I 
aggree its not ideal.
Its the only place though which knows whether its actually sensible to wakeup 
the walsender. We could make it return whether it wrote anything and do the 
wakeup at the callers. I count 4 different callsites which would be an 
annoying duplication but I don't really see anything better right now.

Better Ideas?

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Jeff Janes
On Thu, May 10, 2012 at 10:44 AM, Bruce Momjian br...@momjian.us wrote:
 On Thu, May 10, 2012 at 01:11:54PM +0100, Peter Geoghegan wrote:

 Why can't we call group commit group commit (and for that matter,
 index-only scans index-only scans), so that people will understand
 that we are now competitive with other RDBMSs in this area? Improve
 performance of WAL writes when multiple transactions commit at the
 same time seems like a pretty bad description, since it doesn't make
 any reference to batching of commits.  Also, I don't think that the

 I didn't call it group commit because we have settings we used to
 regard as group commit:

My understanding of that patch is that is does not cause group
commit to happen, but rather when a group commit does happen
naturally it causes all members of the group to awaken more
quickly/efficiently than they previously would have.


        #commit_delay = 0           # range 0-10, in microseconds
        #commit_siblings = 5            # range 1-1000

 These are still there.  Should they be removed?

The new locking around releasing group commit waiters has, if
anything, made these two more effective than before.  But that isn't
really saying much.  It seems like these knobs are (and were)
primarily useful for doing stupid benchmark tricks of little
practical value.  If there is an argument for removing them, I think
it would revolve around either They never really should have been
there anyway, or These days when people need far more commits per
second than they have revolutions per second, they buy BBU or NVRAM.

 I updated the release docs to call the item group commit because I now
 don't see any reference to that term in our docs.

I don't think I'd mention WAL writing at all, and just say that it
improves the concurrency of locking around group commits.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Peter Eisentraut
On fre, 2012-05-11 at 11:32 +0200, Magnus Hagander wrote:
  You are misinterpreting this.  The reason Bruce's link was removed was
  that the other (official) build was set to run at the same frequency, so
  Bruce's build was exactly redundant.  The requirement/aspiration to have
  a few minutes turnaround time continued.
 
 But the other (official) build was *not* set to run at the same
 frequency. It was set, according to that mail, to run frequently
 enough, but it did not run every 5 minutes. at least not the only
 cronjob I found back then.

I would love to see what cron job job you are referring to.
Unfortunately, I don't have a backup, but I'm pretty sure at one point
it ran every three minutes or so.

 But in the interest of actually being productive - what *is* the
 usecase for needing a 5 minute turnaround time?

I don't exactly know, it was done at the request of users.  A lot of
people wanted to see the documentation of new checkins, just to learn
about how the new features worked.

As a general point, any delay time is going to raise questions, because
it usually won't be easy to find out when things will happen.  So the
human maintenance effort will be lower if it runs as soon as possible.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-11 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... I don't really see any particular merit in removing our own
 XID from the picture entirely: that changes the behavior more
 significantly for no particular benefit.

The merit would be in keeping the function's definition simple.

Anyway, let's see if breaking this down by cases clarifies anything.
As I see it, there are three possible cases:

1. On master, xact already has an XID.  The longstanding behavior is
to use that XID as reference.  The committed patch changes this to
reference whatever is next-to-assign XID at first call of age(), but
it's far from clear to me that that's better for this case in isolation.

2. On master, xact does not (yet) have an XID.  The previous behavior
is to force XID assignment at first call of age().  However, if we
capture ReadNewTransactionId as per patch then we give the same answer
as we would have done before, only without assigning the xact an XID.
It could be argued that this can yield inconsistent results if the xact
later does something that forces XID assignment anyway, but surely
that's a pretty narrow corner case.

3. On slave, so xact cannot have an XID.  Previous behavior is to fail
which we all agree is unhelpful.  Capturing ReadNewTransactionId
provides behavior somewhat similar to patched case #2, though it's
unclear to me exactly how compatible it is given the likely skew between
master and slave notions of the next XID.

It's arguable that what we should do is use XID if on master, capture
ReadNewTransactionId if on slave, which would avoid any backwards
incompatibility for the first two cases while still fixing the case that
everybody agrees is a problem.  Simon argues that this gives a weird
variance in the master vs slave behavior, but I'm not sure I believe
that's an issue.  In case 2, the only way that the user can tell the
difference between force-XID-assignment and capture-ReadNewTransactionId
is if the transaction later does something requiring an XID, which
cannot happen anyway on the slave.  So from here the difference in these
behaviors seems minimal and not worth creating incompatibility in the
first two cases for.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Agenda For 3rd Cluster Hackers Summit, May 15th in Ottawa

2012-05-11 Thread Josh Berkus
Hackers, Replicators, Clusterers:

The agenda and the list of attendees for the cluster-hackers summit has
been set:

http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit

Please send me any corrections, changes or additions ASAP.

Thanks you!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 That definitely doesn't seem ideal - a lot of things can pile up
 behind WALWriteLock.  I'm not sure how big a problem it would be in
 practice, but we generally make a practice of avoiding sending signals
 while holding LWLocks whenever possible...

There's a good reason for that, which is that the scheduler might well
decide to go run the wakened process instead of you.  Admittedly this
tends to not be a problem on machines with $bignum CPUs, but on
single-CPU machines I've seen it happen a lot.

Refactoring so that the signal is sent only after lock release seems
like a good idea to me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Andres Freund
On Friday, May 11, 2012 08:36:24 PM Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  That definitely doesn't seem ideal - a lot of things can pile up
  behind WALWriteLock.  I'm not sure how big a problem it would be in
  practice, but we generally make a practice of avoiding sending signals
  while holding LWLocks whenever possible...
 
 There's a good reason for that, which is that the scheduler might well
 decide to go run the wakened process instead of you.  Admittedly this
 tends to not be a problem on machines with $bignum CPUs, but on
 single-CPU machines I've seen it happen a lot.
 
 Refactoring so that the signal is sent only after lock release seems
 like a good idea to me.
Will send a patch lateron, duplication seems to be manageable.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Its the only place though which knows whether its actually sensible to wakeup 
 the walsender. We could make it return whether it wrote anything and do the 
 wakeup at the callers. I count 4 different callsites which would be an 
 annoying duplication but I don't really see anything better right now.

Another point here is that XLogWrite is not only normally called with
the lock held, but inside a critical section.  I see no reason to take
the risk of doing signal sending inside critical sections.

BTW, a depressingly large fraction of the existing calls to WalSndWakeup
are also inside critical sections, generally for no good reason that I
can see.  For example, in EndPrepare(), why was the call placed where
it is and not down beside SyncRepWaitForLSN?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Simon Riggs
On 11 May 2012 19:45, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 Its the only place though which knows whether its actually sensible to wakeup
 the walsender. We could make it return whether it wrote anything and do the
 wakeup at the callers. I count 4 different callsites which would be an
 annoying duplication but I don't really see anything better right now.

 Another point here is that XLogWrite is not only normally called with
 the lock held, but inside a critical section.  I see no reason to take
 the risk of doing signal sending inside critical sections.

 BTW, a depressingly large fraction of the existing calls to WalSndWakeup
 are also inside critical sections, generally for no good reason that I
 can see.  For example, in EndPrepare(), why was the call placed where
 it is and not down beside SyncRepWaitForLSN?

I think because nobody thought of that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer code behaving strangely on postmaster -T

2012-05-11 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of jue may 10 02:27:32 -0400 2012:
 Alvaro Herrera alvhe...@alvh.no-ip.org writes:
 I noticed while doing some tests that the checkpointer process does not
 recover very nicely after a backend crashes under postmaster -T (after
 all processes have been kill -CONTd, of course, and postmaster told to
 shutdown via Ctrl-C on its console).  For some reason it seems to get
 stuck on a loop doing sleep(0.5s)  In other case I caught it trying to
 do a checkpoint, but it was progressing a single page each time and then
 sleeping.  In that condition, the checkpoint took a very long time to
 finish.

 Is this still a problem as of HEAD?  I think I've fixed some issues in
 the checkpointer's outer loop logic, but not sure if what you saw is
 still there.

 Yep, it's still there as far as I can tell.  A backtrace from the
 checkpointer shows it's waiting on the latch.

I'm confused about what you did here and whether this isn't just pilot
error.  If you run with -T then the postmaster will just SIGSTOP the
remaining child processes, but then it will sit and wait for them to
die, since the state machine expects them to react as though they'd been
sent SIGQUIT.  If you SIGCONT any of them then that process will resume,
totally ignorant that it's supposed to die.  So kill -CONTd, of course
makes no sense to me.  I tried killing one child with -KILL, then
sending SIGINT to the postmaster, then killing the remaining
already-stopped children, and the postmaster did exit as expected after
the last child died.

So I don't see any bug here.  And, after closer inspection, your
previous proposed patch is quite bogus because checkpointer is not
supposed to stop yet when the other processes are being terminated
normally.

Possibly it'd be useful to teach the postmaster more thoroughly about
SIGSTOP and have a way for it to really kill the remaining children
after you've finished investigating their state.  But frankly this
is the first time I've heard of anybody using that feature at all;
I always thought it was a vestigial hangover from days when the kernel
was too stupid to write separate core dump files for each backend.
I'd rather remove SendStop than add more complexity there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer code behaving strangely on postmaster -T

2012-05-11 Thread Alvaro Herrera

Excerpts from Tom Lane's message of vie may 11 16:50:01 -0400 2012:

  Yep, it's still there as far as I can tell.  A backtrace from the
  checkpointer shows it's waiting on the latch.
 
 I'm confused about what you did here and whether this isn't just pilot
 error.  If you run with -T then the postmaster will just SIGSTOP the
 remaining child processes, but then it will sit and wait for them to
 die, since the state machine expects them to react as though they'd been
 sent SIGQUIT.

The sequence of events is:
postmaster -T
crash a backend
SIGINT postmaster
SIGCONT all child processes

My expectation is that postmaster should exit normally after this.  What
happens instead is that all processes exit, except checkpointer.  And in
fact, postmaster is now in PM_WAIT_BACKENDS state, so sending SIGINT a
second time will not shutdown checkpointer either.

Maybe we can consider this to be just pilot error, but then why do all
other processes exit normally?  To me it just seems an oversight in
checkpointer shutdown handling in conjuction with -T.

 If you SIGCONT any of them then that process will resume,
 totally ignorant that it's supposed to die.  So kill -CONTd, of course
 makes no sense to me.  I tried killing one child with -KILL, then
 sending SIGINT to the postmaster, then killing the remaining
 already-stopped children, and the postmaster did exit as expected after
 the last child died.

Uhm, after you SIGINTd postmaster didn't it shutdown all children?  That
would be odd.

 So I don't see any bug here.  And, after closer inspection, your
 previous proposed patch is quite bogus because checkpointer is not
 supposed to stop yet when the other processes are being terminated
 normally.

Well, it does send the signal only when FatalError is set.  So it should
only affect -T behavior.

 Possibly it'd be useful to teach the postmaster more thoroughly about
 SIGSTOP and have a way for it to really kill the remaining children
 after you've finished investigating their state.  But frankly this
 is the first time I've heard of anybody using that feature at all;
 I always thought it was a vestigial hangover from days when the kernel
 was too stupid to write separate core dump files for each backend.
 I'd rather remove SendStop than add more complexity there.

Hah.  I've used it a few times, but I can see that multiple core files
are okay.  Maybe you're right and we should just remove it.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer code behaving strangely on postmaster -T

2012-05-11 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of vie may 11 16:50:01 -0400 2012:
 I'm confused about what you did here and whether this isn't just pilot
 error.

 The sequence of events is:
 postmaster -T
 crash a backend
 SIGINT postmaster
 SIGCONT all child processes

 My expectation is that postmaster should exit normally after this.

Well, my expectation is that the postmaster should wait for the children
to finish dying, and then exit rather than respawn anything.  It is not
on the postmaster's head to make them die anymore, because it already
(thinks it) sent them SIGQUIT.  Using SIGCONT here is pilot error.

 Maybe we can consider this to be just pilot error, but then why do all
 other processes exit normally?

The reason for that is that the postmaster's SIGINT interrupt handler
(lines 2163ff) sent them SIGTERM, without bothering to notice that we'd
already sent them SIGQUIT/SIGSTOP; so once you CONT them they get the
SIGTERM and drop out normally.  That handler knows it should not signal
the checkpointer yet, so the checkpointer doesn't get the memo.  But the
lack of a FatalError check here is just a simplicity of implementation
thing; it should not be necessary to send any more signals once we are
in FatalError state.  Besides, this behavior is all wrong for a crash
recovery scenario: there is no guarantee that shared memory is in good
enough condition for SIGTERM shutdown to work.  And we *definitely*
don't want the checkpointer trying to write a shutdown checkpoint.

 So I don't see any bug here.  And, after closer inspection, your
 previous proposed patch is quite bogus because checkpointer is not
 supposed to stop yet when the other processes are being terminated
 normally.

 Well, it does send the signal only when FatalError is set.  So it should
 only affect -T behavior.

If FatalError is set, it should not be necessary to send any more
signals, period, because we already tried to kill every child.  If we
need to defend against somebody using SIGSTOP/SIGCONT inappropriately,
it would take a lot more thought (and code) than this, and it would
still be extremely fragile because a SIGCONT'd backend is going to be
executing against possibly-corrupt shared memory.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP: parameterized function scan

2012-05-11 Thread Antonin Houska

Hello,
following this short discussion
http://archives.postgresql.org/message-id/4f5aa202.9020...@gmail.com
I gave it one more try and hacked the optimizer so that function can 
become an inner relation in NL join, parametrized with values from the 
outer relation.


I tried to explain my thoughts in comments. Other than that:

1. I haven't tried to use SpecialJoinInfo to constrain join order. Even 
if the order matters in query like

SELECT * from a, func(a.i)
it's still inner join by nature. SpecialJoinInfo is used for INNER join 
rarely, but never stored in PlannerInfo. Doing so only for these lateral 
functions would be rather disruptive.


2. Simple SQL function (i.e. one that gets pulled-up into the main 
query) is a special case. The query that results from such a pull-up no 
longer contains any function (and thus is not affected by this patch) 
but such cases need to be newly taken into account and examined / tested 
(the patch unblocks them at parsing stage too).


3. There might be some open questions about SQL conformance.

I've spent quite a while looking into the optimizer code and after all I 
was surprised that it didn't require that many changes. At least to make 
few simple examples work. Do I ignore any important fact(s) ?


Thanks,
Tony.


diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 446319d..f30ae4e 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -85,6 +85,44 @@ add_paths_to_joinrel(PlannerInfo *root,
 	SemiAntiJoinFactors semifactors;
 	Relids		param_source_rels = NULL;
 	ListCell   *lc;
+	bool nest_loop_only = false;
+
+	/*
+	 * Should the outer path be parametrized by the inner, there's no method
+	 * to evaluate such a join.
+	 */
+	if (bms_overlap(outerrel-func_arg_relids, innerrel-relids)) {
+		return;
+	}
+
+	/*
+	 * Only nest-loop join is considered viable method for lateral function.
+	 *
+	 * Non-empty 'innerrel-func_arg_relids' isn't a sufficient condition for
+	 * NL join with parametrized function on the inner side:
+	 *
+	 * If there's no overlap, the function arguments have already been used
+	 * to construct the innerrel (no reason to use them again) or they are supplied
+	 * from higher level (and thus constant for the current join) .
+	 */
+	if (bms_overlap(innerrel-func_arg_relids, outerrel-relids)) {
+		/*
+		 * Merge join is the only method to evaluate full join, but merge join is a bad option
+		 * for lateral functions.
+		 */
+		if (jointype == JOIN_FULL) {
+			return;
+		}
+
+		/*
+		 * TODO
+		 * Probably not relevant. Verify.
+		 */
+		if (jointype == JOIN_SEMI || jointype == JOIN_ANTI) {
+			return;
+		}
+		nest_loop_only = true;
+	}
 
 	/*
 	 * Find potential mergejoin clauses.  We can skip this if we are not
@@ -92,7 +130,7 @@ add_paths_to_joinrel(PlannerInfo *root,
 	 * way of implementing a full outer join, so override enable_mergejoin if
 	 * it's a full join.
 	 */
-	if (enable_mergejoin || jointype == JOIN_FULL)
+	if ((enable_mergejoin || jointype == JOIN_FULL)  !nest_loop_only)
 		mergeclause_list = select_mergejoin_clauses(root,
 	joinrel,
 	outerrel,
@@ -151,7 +189,7 @@ add_paths_to_joinrel(PlannerInfo *root,
 	 * 1. Consider mergejoin paths where both relations must be explicitly
 	 * sorted.	Skip this if we can't mergejoin.
 	 */
-	if (mergejoin_allowed)
+	if (mergejoin_allowed  !nest_loop_only)
 		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
 			 restrictlist, mergeclause_list, jointype,
 			 sjinfo, param_source_rels);
@@ -192,10 +230,13 @@ add_paths_to_joinrel(PlannerInfo *root,
 	 * before being joined.  As above, disregard enable_hashjoin for full
 	 * joins, because there may be no other alternative.
 	 */
-	if (enable_hashjoin || jointype == JOIN_FULL)
+	if ((enable_hashjoin || jointype == JOIN_FULL)  !nest_loop_only)
 		hash_inner_and_outer(root, joinrel, outerrel, innerrel,
 			 restrictlist, jointype,
 			 sjinfo, semifactors, param_source_rels);
+
+	/* Prepare for check of function parametrization at the next higher level. */
+	joinrel-func_arg_relids = bms_union(outerrel-func_arg_relids, innerrel-func_arg_relids);
 }
 
 /*
@@ -654,6 +695,7 @@ match_unsorted_outer(PlannerInfo *root,
 	Path	   *inner_cheapest_total = innerrel-cheapest_total_path;
 	Path	   *matpath = NULL;
 	ListCell   *lc1;
+	bool nest_loop_only = bms_overlap(innerrel-func_arg_relids, outerrel-relids);
 
 	/*
 	 * Nestloop only supports inner, left, semi, and anti joins.  Also, if we
@@ -814,6 +856,9 @@ match_unsorted_outer(PlannerInfo *root,
 		if (save_jointype == JOIN_UNIQUE_OUTER)
 			continue;
 
+		if (nest_loop_only)
+			continue;
+
 		/* Look for useful mergeclauses (if any) */
 		mergeclauses = find_mergeclauses_for_pathkeys(root,
 	  outerpath-pathkeys,
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c34b9b8..5552fa8 100644
--- 

Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Tom Lane
Albe Laurenz laurenz.a...@wien.gv.at writes:
 Michael Nolan wrote:
 I see one potential difference between your results and mine.
 When I rebuild the tablespace, I wind up with the same filename/OID as
 before, I'm not sure you do.

 Right. That's strange.
 Usually OIDs get incremented, so you shouldn't end up with the same
 OID for the new tablespace.

I believe I see what's happening here, and the difference is that
Michael deleted the tablespace's directory while Albe only deleted the
files in it.

The former case causes destroy_tablespace_directories to exit early,
without throwing a hard error, and without having removed the symlink
for the tablespace OID in $PGDATA/pg_tblspc/.  This means that after
re-creating a new tablespace in the same directory location, that old
symlink works again, even though it no longer corresponds to any OID in
pg_tablespace.  Thus, Michael doesn't see an error in his REINDEX;
physical access to the index still works even though the index's
reltablespace is no longer really valid.  The reported symptom of \d
not showing the tablespace is because the code in psql's describe.c will
silently ignore a reltablespace entry that does not match any OID in
pg_tablespace.

We could prevent this scenario if we changed
destroy_tablespace_directories so that for any non-fatal-error
situation, it continues to march on and try to destroy the remaining
infrastructure, particularly the symlink.  I'm not sure that's really a
good idea, but it definitely seems to be a bad idea to leave the symlink
in place when we're removing the pg_tablespace row.  Alternatively we
could make more of those cases be errors rather than warnings, so that
the pg_tablespace row removal would be rolled back.

The comment in destroy_tablespace_directories indicates that what we're
trying to support by not throwing an error for missing directory is
cleaning up a dead pg_tablespace row, which suggests that removing the
symlink too would be reasonable.

A larger question is whether we should start making pg_shdepend entries
for table/index usage of non-default tablespaces, so that you couldn't
DROP a tablespace that the catalogs think still has tables/indexes in
it.  I'm not sure that that'd be particularly helpful though.  It
certainly wouldn't do anything to protect against the scenario discussed
here of an external agency zapping all the files.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-11 Thread Robert Haas
On Fri, May 11, 2012 at 2:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The merit would be in keeping the function's definition simple.

True.  It's not *much* simpler, but it is simpler.

 Anyway, let's see if breaking this down by cases clarifies anything.
 As I see it, there are three possible cases:

 1. On master, xact already has an XID.  The longstanding behavior is
 to use that XID as reference.  The committed patch changes this to
 reference whatever is next-to-assign XID at first call of age(), but
 it's far from clear to me that that's better for this case in isolation.

 2. On master, xact does not (yet) have an XID.  The previous behavior
 is to force XID assignment at first call of age().  However, if we
 capture ReadNewTransactionId as per patch then we give the same answer
 as we would have done before, only without assigning the xact an XID.
 It could be argued that this can yield inconsistent results if the xact
 later does something that forces XID assignment anyway, but surely
 that's a pretty narrow corner case.

 3. On slave, so xact cannot have an XID.  Previous behavior is to fail
 which we all agree is unhelpful.  Capturing ReadNewTransactionId
 provides behavior somewhat similar to patched case #2, though it's
 unclear to me exactly how compatible it is given the likely skew between
 master and slave notions of the next XID.

 It's arguable that what we should do is use XID if on master, capture
 ReadNewTransactionId if on slave, which would avoid any backwards
 incompatibility for the first two cases while still fixing the case that
 everybody agrees is a problem.  Simon argues that this gives a weird
 variance in the master vs slave behavior, but I'm not sure I believe
 that's an issue.  In case 2, the only way that the user can tell the
 difference between force-XID-assignment and capture-ReadNewTransactionId
 is if the transaction later does something requiring an XID, which
 cannot happen anyway on the slave.  So from here the difference in these
 behaviors seems minimal and not worth creating incompatibility in the
 first two cases for.

Yeah.  I don't think I particularly care what we do in HEAD, but it
sure seems like it would be nice to change the back-branch behavior as
little as possible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Robert Haas
On Fri, May 11, 2012 at 7:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 A larger question is whether we should start making pg_shdepend entries
 for table/index usage of non-default tablespaces, so that you couldn't
 DROP a tablespace that the catalogs think still has tables/indexes in
 it.

I'm astonished we don't do that already.  Seems inconsistent with
other SQL object types - most obviously, schemas - and a potentially
giant foot-gun.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Robert Haas
On Fri, May 11, 2012 at 2:03 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Thu, May 10, 2012 at 10:44 AM, Bruce Momjian br...@momjian.us wrote:
 On Thu, May 10, 2012 at 01:11:54PM +0100, Peter Geoghegan wrote:

 Why can't we call group commit group commit (and for that matter,
 index-only scans index-only scans), so that people will understand
 that we are now competitive with other RDBMSs in this area? Improve
 performance of WAL writes when multiple transactions commit at the
 same time seems like a pretty bad description, since it doesn't make
 any reference to batching of commits.  Also, I don't think that the

 I didn't call it group commit because we have settings we used to
 regard as group commit:

 My understanding of that patch is that is does not cause group
 commit to happen, but rather when a group commit does happen
 naturally it causes all members of the group to awaken more
 quickly/efficiently than they previously would have.

Right.  It's not a new feature; it's a performance improvement.  We've
had group commit for a long time; it just didn't work very well
before.  And it's not batching the commits better; it's reducing the
lock contention around realizing that the batched commit has happened.

 I updated the release docs to call the item group commit because I now
 don't see any reference to that term in our docs.

 I don't think I'd mention WAL writing at all, and just say that it
 improves the concurrency of locking around group commits.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Alvaro Herrera

Excerpts from Robert Haas's message of vie may 11 20:28:28 -0400 2012:
 On Fri, May 11, 2012 at 7:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  A larger question is whether we should start making pg_shdepend entries
  for table/index usage of non-default tablespaces, so that you couldn't
  DROP a tablespace that the catalogs think still has tables/indexes in
  it.
 
 I'm astonished we don't do that already.  Seems inconsistent with
 other SQL object types - most obviously, schemas - and a potentially
 giant foot-gun.

The original patch did contain tablespace tracking (though I don't
remember considering whether they were default or not), but it got
ripped out because during the subsequent discussion we considered that
it wasn't necessary to keep track of it -- supposedly, whenever you were
going to delete a tablespace, the existing files in the directory would
be sufficient evidence to stop the deletion.  Evidently I failed to
consider the case at hand.

I don't think there's any particular reason we can't put it back.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Robert Haas's message of vie may 11 20:28:28 -0400 2012:
 I'm astonished we don't do that already.  Seems inconsistent with
 other SQL object types - most obviously, schemas - and a potentially
 giant foot-gun.

 The original patch did contain tablespace tracking (though I don't
 remember considering whether they were default or not), but it got
 ripped out because during the subsequent discussion we considered that
 it wasn't necessary to keep track of it -- supposedly, whenever you were
 going to delete a tablespace, the existing files in the directory would
 be sufficient evidence to stop the deletion.  Evidently I failed to
 consider the case at hand.

Well, the question to me is exactly how much good it will do to stop
deletion of the pg_tablespace entry, if the underlying files are gone.
I'm having a hard time getting excited about expending cycles on that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Michael Nolan
On Fri, May 11, 2012 at 10:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:


 Well, the question to me is exactly how much good it will do to stop
 deletion of the pg_tablespace entry, if the underlying files are gone.
 I'm having a hard time getting excited about expending cycles on that.


There could be multiple reasons why the underlying files are not there,
such as a filesystem that isn't currently mounted for some reason.

It seems prudent to throw an error on drop tablespace if there are
references to that tablespace in the catalog, or perhaps require a 'force'
clause to override any errors, but it probably isn't something most DBAs
would run into very often.

Thanks for figuring it out, Tom.
--
MIke Nolan