Re: [HACKERS] Why do we still have commit_delay and commit_siblings?

2012-05-14 Thread Simon Riggs
On 14 May 2012 00:45, Robert Haas robertmh...@gmail.com wrote:
 On Sun, May 13, 2012 at 7:17 PM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 Have I missed something? Why do we keep around this foot-gun that now
 appears to me to be at best useless and at worst harmful? I can see
 why the temptation to keep this setting around used to exist, since it
 probably wasn't too hard to get good numbers from extremely synthetic
 pgbench runs, but I cannot see why the new adaptive implementation
 wouldn't entirely shadow the old one even in that situation.

 It seems that, with the new code, when there are a lot of people
 trying to commit very frequently, they tend to divide themselves into
 two gangs: everybody in one gang commits, then everyone in the other
 gang commits, then everyone in the first gang commits again, and so
 on.  Assuming that the transactions themselves require negligible
 processing time, this provides 50% of the theoretically optimum
 throughput.

Keeping a parameter without any clue as to whether it has benefit is
just wasting people's time.

We don't ADD parameters based on supposition, why should we avoid
removing parameters that have no measured benefit?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we still have commit_delay and commit_siblings?

2012-05-14 Thread Robert Haas
On Mon, May 14, 2012 at 2:07 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Keeping a parameter without any clue as to whether it has benefit is
 just wasting people's time.

No, arguing that we should remove a parameter because it's useless
when you haven't bothered to test whether or not it actually is
useless is wasting people's time.

 We don't ADD parameters based on supposition, why should we avoid
 removing parameters that have no measured benefit?

If they have no actual benefit, of course we should remove them.  If
they have no measured benefit because no one has bothered to measure,
that's not a reason to remove them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we still have commit_delay and commit_siblings?

2012-05-14 Thread Heikki Linnakangas

On 14.05.2012 02:17, Peter Geoghegan wrote:

One illuminating way of quickly explaining the new group commit code
is that it also inserts a delay at approximately the same place (well,
more places now, since the delay was previously inserted only at the
xact.c callsite of XLogFlush(), and there are plenty more sites than
that), only that delay is now just about perfectly adaptive. That
isn't quite the full truth, since we *also* have the benefit of
*knowing* that there is an active leader/flusher at all times we're
delayed. The unusual semantics of LWLockAcquireOrWait() result in the
new 9.2 code almost always just causing a delay for most backends when
group commit matters (that is, a delay before they reach their final,
productive iteration of the for(;;) loop for the commit, usually
without them ever actually acquiring the exclusive
lock/XlogWrite()'ing).


That doesn't seem like an accurate explanation of how the code works. It 
doesn't insert a deliberate delay anywhere. At a high level, the 
algorithm is exactly the same as before. However, the new code improves 
the concurrency of noticing that the WAL has been flushed. If you had a 
machine where context switches are infinitely fast and has zero 
contention from accessing shared memory, the old and new code would 
behave the same.


It was an impressive improvement, but the mechanism is completely 
different from commit_delay and commit_siblings.


That said, I wouldn't mind removing commit_delay and commit_siblings. 
They're pretty much impossible to tune correctly, assuming they work as 
advertised. Some hard data would be nice, though, as Robert suggested.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we still have commit_delay and commit_siblings?

2012-05-14 Thread Simon Riggs
On 14 May 2012 07:30, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

 That said, I wouldn't mind removing commit_delay and commit_siblings.
 They're pretty much impossible to tune correctly, assuming they work as
 advertised. Some hard data would be nice, though, as Robert suggested.

Those parameters were already hard to get any benefit from, even in a
benchmark. In a wide range of cases/settings they produce clear
degradation.

Any thorough testing would involve a range of different hardware
types, so its unlikely to ever occur. So lets just move on.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we still have commit_delay and commit_siblings?

2012-05-14 Thread Magnus Hagander
On Mon, May 14, 2012 at 8:17 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, May 14, 2012 at 2:07 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Keeping a parameter without any clue as to whether it has benefit is
 just wasting people's time.

 No, arguing that we should remove a parameter because it's useless
 when you haven't bothered to test whether or not it actually is
 useless is wasting people's time.

It's most certainly not, IMHO. Discussing it here is *not* a waste of
time. Or if any, it's a waste of time for a couple of people. If we
leave it in, and it's useless, we waste the time of thousands of
users. The choice between those two should be obvious.


 We don't ADD parameters based on supposition, why should we avoid
 removing parameters that have no measured benefit?

 If they have no actual benefit, of course we should remove them.  If
 they have no measured benefit because no one has bothered to measure,
 that's not a reason to remove them.

Another option might be to as a first step remove them from the .conf
file or have a deprecated section, maybe? But if we do that, people
aren't likely to use them anyway...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Analyzing foreign tables memory problems

2012-05-14 Thread Albe Laurenz
Noah Misch wrote:
 1) Expose WIDTH_THRESHOLD in commands/vacuum.h and add
documentation
so that the authors of foreign data wrappers are aware of the
problem and can avoid it on their side.
This would be quite simple.

 Seems reasonable.  How would the FDW return an indication that a
value was
 non-NULL but removed due to excess width?

 The FDW would return a value of length WIDTH_THRESHOLD+1 that is
 long enough to be recognized as too long, but not long enough to
 cause a problem.

 Here is a simple patch for that.
 
 It feels to me like a undue hack to ask FDW authors to synthesize such
values.
 It's easy enough for data types such as text/bytea.  In general,
though,
 simple truncation may not produce a valid value of the type.  That
shouldn't
 matter, since the next action taken on the value should be to discard
it, but
 it's fragile.  Can we do better?
 
 Just thinking out loud, we could provide an extern Datum
AnalyzeWideValue;
 and direct FDW authors to use that particular datum.  It could look
like a
 toasted datum of external size WIDTH_THRESHOLD+1 but bear
va_toastrelid ==
 InvalidOid.  Then, if future code movement leads us to actually
examine one of
 these values, we'll get an early, hard error.

That would be very convenient indeed.

Even better would be a function
extern Datum createAnalyzeWideValue(integer width)
so that row width calculations could be more accurate.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update comments for PGPROC/PGXACT split

2012-05-14 Thread Heikki Linnakangas

On 13.05.2012 23:52, Noah Misch wrote:

Many comment references to PGPROC and MyProc should now refer to PGXACT and
MyPgXact.  This patch attempts to cover all such cases.  In some places, a
comment refers collectively to all the xid-related fields, which span both
structures.  I variously changed those to refer to either or both structures
depending on how I judged the emphasis.

The PGXACT patch extracted the PGPROC struct formerly embedded in each
GlobalTransaction.  I updated a few comments accordingly.

The procarray.c leading comment described the procarray as unsorted, but it
is sorted by memory address; I removed that word.

I include two typo-only fixes on related but already-accurate lines.


Thanks, applied!

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-14 Thread Andres Freund
On Friday, May 11, 2012 08:45:23 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Its the only place though which knows whether its actually sensible to
  wakeup the walsender. We could make it return whether it wrote anything
  and do the wakeup at the callers. I count 4 different callsites which
  would be an annoying duplication but I don't really see anything better
  right now.
 
 Another point here is that XLogWrite is not only normally called with
 the lock held, but inside a critical section.  I see no reason to take
 the risk of doing signal sending inside critical sections.

 BTW, a depressingly large fraction of the existing calls to WalSndWakeup
 are also inside critical sections, generally for no good reason that I
 can see.  For example, in EndPrepare(), why was the call placed where
 it is and not down beside SyncRepWaitForLSN?
Hm. While I see no real problem moving it out of the lock I don't really see a 
way to cleanly outside critical sections everywhere. The impact of doing so 
seems to be rather big to me. The only externally visible place where it 
actually is known whether we write out data and thus do the wakeup is 
XLogInsert, XLogFlush and XLogBackgroundFlush. The first two of those are 
routinely called inside a critical section.

Andres
-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-14 Thread Stephen Frost
* Qi Huang (huangq...@hotmail.com) wrote:
 Thanks, guys, for your hot discussion. I'll explore the ANALYZE command first 
 and try make SYSTEM sampling work. Other parts, I'll look at them later. 

That sounds like the right first steps to me.  Reviewing this
discussion, it was my thought to create a new node, ala seqscan, which
implemented analyze's algorithm for scanning the table.  The eventual
idea being that analyze would actually use it in the future.  There was
mention up-thread about just calling the analyze code.  Personally, I'd
rather we make analyze more SQL like (once we have this functionality
implemented) than make things which are supposed to be SQL call out into
analyze bits.

Thoughts?  Other opinions?

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] hot standby PSQL 9.1 Windows 2008 Servers

2012-05-14 Thread chinnaobi
Hi All,

I have implemented hot standby for PostgreSQL with a group of two Primary
and Standby in windows 2008 servers.

Currently below are the settings:

1. Archiving is enabled on primary, stored on network storage.
2. Asynchronous Streaming replication from primary to standby.
wal-senders=5, wal_keep_segments=32.
3. I am checking heartbeat of primary from standby, when it is down i am
restarting service by changing postgresql.conf and recovery.conf settings.

I do base backup only first time on standby when it is going to be
replicated. when ever primary goes down,  standby becomes primary and
primary becomes standby when primary comes up. When primary becomes standby
I am restoring data from WAL archive and start postgres service streaming
replication to connect to primary. 

This setup is working.

I have tested for few days in my network with huge data input to the primary
and restarting the servers multiple times. I have observed these below
specific errors in standby server and standby server is not replicating in
anymore.

*1. Invalid primary and secondary checkpoints. Very rare But happend
2. contrecord is requested by 0/DF20 -- Major one
3.  record with zero length at 0/DF78, invalid record length at
0/DF78*

These errors are shown up when primary switching to standby by recovering
data from archive and standby stops replicating. *I am not able to start the
service for that I should kill all the process of postgres and then do base
backup and then start service on stnadby.*

I don't have any prior experience working with postgreSQL. Please tell me 

what are the most reliable settings for Hot standby with streaming
replication to work ? two 2008 servers and 1 network storage available.
How to avoid above errors ??



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/hot-standby-PSQL-9-1-Windows-2008-Servers-tp5708637.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we still have commit_delay and commit_siblings?

2012-05-14 Thread Jeff Janes
On Sun, May 13, 2012 at 4:17 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 This code is our pre-9.2 group commit implementation, pretty much in
 its entirety:

 if (CommitDelay  0  enableFsync 
        MinimumActiveBackends(CommitSiblings))
        pg_usleep(CommitDelay);

A semantic issue, I guess, but I would say that all the code that
keeps the xlogctl-LogwrtRqst updated and reads from it is also part
of the group commit implementation.


 This code is placed directly before the RecordTransactionCommit() call
 of XLogFlush(). It seeks to create a delay of commit_delay immediately
 prior to flushing when MinimumActiveBackends(CommitSiblings), in the
 hope that that delay will be enough that when we do eventually reach
 XLogFlush(), we can fastpath out of it due to Postgres by then having
 already flushed up to the XLogRecPtr we need flushed.

I think it is more useful to put it the other way around.  The hope is
that when we eventually reach XLogFlush, we will find that other
people have added their commit records, so that we can fsync theirs as
well as our, letting them fast path out later on.

Really if someone else is already doing the sleep, there is no reason
for the current process to do so as well.  All that will do is delay
the time before the current process wakes up and realizes it has
already been fsynced.  Instead it should block on the other sleeping
process.  But that didn't help much if any when I tried it a couple
years ago.  It might work better now.
...

 The original group commit patch that Simon and I submitted deprecated
 these settings, and I fail to understand why the committed patch
 didn't do that too. These days, the only way this code could possibly
 result in a fastpath is here, at transam/xlog.c:2094, with the
 potentially stale (and likely too stale to be useful when new group
 commit is in play) LogwrtResult value:

                /* done already? */
                if (XLByteLE(record, LogwrtResult.Flush))
                        break;

Why is that likely too stale to be useful?  It was updated just 4
lines previous.



 Now, previously, we could also fastpath a bit later, at about
 transam/xlog.c:2114:

                /* Got the lock */
                LogwrtResult = XLogCtl-LogwrtResult;
                if (!XLByteLE(record, LogwrtResult.Flush))
                {
                        .
                }
                /* control might have fastpathed and missed the above block. 
 We're
 done now. */

 ...but now control won't even reach here unless we have the exclusive
 lock on WALWriteLock (i.e. we're the leader backend during 9.2's group
 commit), so fastpathing out of XLogFlush() becomes very rare in 9.2 .

 One illuminating way of quickly explaining the new group commit code
 is that it also inserts a delay at approximately the same place (well,
 more places now, since the delay was previously inserted only at the
 xact.c callsite of XLogFlush(), and there are plenty more sites than
 that), only that delay is now just about perfectly adaptive. That
 isn't quite the full truth, since we *also* have the benefit of
 *knowing* that there is an active leader/flusher at all times we're
 delayed.

I don't get what you are saying.  The new code does not have any more
delays than the old code did, there is still only one
pg_usleep(CommitDelay), and it is still in the same place.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we still have commit_delay and commit_siblings?

2012-05-14 Thread Jeff Janes
On Sun, May 13, 2012 at 11:07 PM, Simon Riggs si...@2ndquadrant.com wrote:

 Keeping a parameter without any clue as to whether it has benefit is
 just wasting people's time.

 We don't ADD parameters based on supposition, why should we avoid
 removing parameters that have no measured benefit?

Using pgbench -T30 -c 2 -j 2 on a 2 core laptop system, with a scale
that fits in shared_buffers:

--commit-delay=2000 --commit-siblings=0
tps = 162.924783 (excluding connections establishing)

--commit-delay=0 --commit-siblings=0
tps = 89.237578 (excluding connections establishing)

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we still have commit_delay and commit_siblings?

2012-05-14 Thread Robert Haas
On Mon, May 14, 2012 at 3:15 AM, Magnus Hagander mag...@hagander.net wrote:
 On Mon, May 14, 2012 at 8:17 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, May 14, 2012 at 2:07 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Keeping a parameter without any clue as to whether it has benefit is
 just wasting people's time.

 No, arguing that we should remove a parameter because it's useless
 when you haven't bothered to test whether or not it actually is
 useless is wasting people's time.

 It's most certainly not, IMHO. Discussing it here is *not* a waste of
 time. Or if any, it's a waste of time for a couple of people. If we
 leave it in, and it's useless, we waste the time of thousands of
 users. The choice between those two should be obvious.

Discussing it in general is not a waste of time, but the argument that
we should remove it because there's no evidence we should keep it is
completely backwards.  We should add OR remove things based on
evidence, not the absence of evidence.  There is certainly room for
discussion about what amount of evidence is adequate, but I do not
think zero is the right number.

Now, interestingly, Jeff Janes just did some testing, and it shows
almost a 2x speedup.  I think that's a much better starting point for
a productive discussion.  Does that change your mind at all?  Is it
too small a boost to be relevant?  Too artificial in some other way?
It doesn't seem impossible to me that the recent group commit changes
made it *easier* to get a benefit out of these settings than it was
before.  It may be that with the old implementation, it was hopeless
to get any kind of improvement out of these settings, but it no longer
is.  Or maybe they're still hopeless.  I don't have a strong opinion
about that, and welcome discussion.  But I'm always going to be
opposed to adding or removing things on the basis of what we didn't
test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we still have commit_delay and commit_siblings?

2012-05-14 Thread Peter Geoghegan
On 14 May 2012 15:09, Robert Haas robertmh...@gmail.com wrote:
 I don't have a strong opinion
 about that, and welcome discussion.  But I'm always going to be
 opposed to adding or removing things on the basis of what we didn't
 test.

The subject of the thread is Why do we still have commit_delay and
commit_siblings?. I don't believe that anyone asserted that we should
remove the settings without some amount of due-diligence testing.
Simon said that thorough testing on many types of hardware was not
practical, which, considering that commit_delay is probably hardly
ever (never?) used in production, I'd have to agree with. With all due
respect, for someone that doesn't have a strong opinion on the
efficacy of commit_delay in 9.2, you seemed to have a strong opinion
on the standard that would have to be met in order to deprecate it.

I think we all could stand to give each other the benefit of the doubt more.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq URL syntax vs SQLAlchemy

2012-05-14 Thread Alex Shulgin
Alex a...@commandprompt.com writes:

 Peter Eisentraut pete...@gmx.net writes:

 I have been reviewing how our new libpq URL syntax compares against
 existing implementations of URL syntaxes in other drivers or
 higher-level access libraries.  In the case of SQLAlchemy, there is an
 incompatibility regarding how Unix-domain sockets are specified.

 First, here is the documentation on that:
 http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html

 The recommended way to access a server over a Unix-domain socket is to
 leave off the host, as in:

 postgresql://user:password@/dbname

 In libpq, this is parsed as host='/dbname', no database.

 Ah, good catch: thanks for heads up.

 I believe this was introduced lately in the dev cycle when we've noticed
 that users will have to specify some defaults explicitly to be able to
 override other defaults, while avoiding the whole ?keyword=value...
 business.

 I'll give this another look and will get back with a proposal to fix
 this in form of a patch.

Upon closer inspection of the issue I came to believe that the proper
fix is to drop support for special treatment of host part starting
with slash altogether.

Attached is a patch to do that.

While the following type of URIs is still valid, the interpretation is
now different and is standards-conforming: the part after the
double-slash is treated as path specification and not authority
(host.)

  postgres:///path-spec

Since the path from a URI translates into dbname connection option, the
only meaningful use of this type of URIs is the following:

  postgres:///mydb

The host part in this case is empty (it is hidden between the // and
the following /,) thus local socket connection is employed for this
type of URIs.  To specify non-standard path to the local sockets
directory use the familiar URI parameter:

  postgres:///db?host=/path/to/socket/dir


Also, if my reading of the RFC is correct, the username, password and
port specifiers may be omitted even if the corresponding designators are
present in URI, so we need to remove some checks on empty URI parts.

The test input and expected output files, the docs and code comments are
also updated, of course.

Finally, to complete the RFC compliance, I've added code to properly
handle percent-encoding in query parameter keywords.


At this point, I feel rather silly that we've produced and committed an
almost compliant version, which still requires quite a bit of
patching to become an RFC-conforming implementation...

--
Regards,
Alex



libpq-uri-rfc-compliance.patch
Description: libpq-uri-rfc-compliance.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-14 Thread Bruce Momjian
On Sun, May 13, 2012 at 09:01:03PM +0100, Peter Geoghegan wrote:
 On 12 May 2012 01:37, Robert Haas robertmh...@gmail.com wrote:
  Right.  It's not a new feature; it's a performance improvement.  We've
  had group commit for a long time; it just didn't work very well
  before.  And it's not batching the commits better; it's reducing the
  lock contention around realizing that the batched commit has happened.
 
 This understanding of group commit is technically accurate, but the
 practical implications of the new code are that lots of people benefit
 from group commit, potentially to rather a large degree, where before
 they had exactly no benefit from group commit. We never officially
 called group commit group commit outside of git commit messages
 before. Therefore, it is sort of like we didn't have group commit
 before but now we do, and it's an implementation that's probably as
 effective as that of any of our competitors. It is for that reason
 that I suggested group commit get a more prominent billing, and that
 it actually be officially referred to as group commit. I'm glad that
 the release notes now actually refer to group commit.
 
 Now, I realise that as one of the authors of the feature I am open to
 accusations of lacking objectivity - clearly it isn't really my place
 to try and influence the feature's placement, and this is the last I
 will say on the matter unless someone else brings it up again. I just
 think that pedantically characterising this as an improvement to our
 existing group commit implementation within a document that will be
 read like a press release is a bad decision, especially since our
 competitors never had a group commit implementation that was far
 inferior to their current implementation. The assumption will be that
 it's a small improvement that's hardly worth noticing at all.

Thanks for the summary. I know we talk about group commit, but I wasn't
aware that it had not been exposed to our general users.  I agree we
need to reword the item as you suggested.  So this group commit happens
even if users don't change these?

#commit_delay = 0   # range 0-10, in microseconds
#commit_siblings = 5# range 1-1000

So the new release item wording will be:

Add group commit capability for sessions that commit at the same
time

This is the git commit message:

Make group commit more effective.

When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.

This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.

Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.

(Heikki Linnakangas)

Is that commit message inaccurate?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-14 Thread Robert Haas
On May 14, 2012, at 9:06 AM, Bruce Momjian br...@momjian.us wrote:
 So the new release item wording will be:
 
Add group commit capability for sessions that commit at the same
time
 
 This is the git commit message:
 
Make group commit more effective.
 
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still 
 need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL 
 flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
 
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
 
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, 
 although
this patch as committed ended up being very different from that.
 
(Heikki Linnakangas)
 
 Is that commit message inaccurate?

No, I think it's actually more accurate than the proposed release note wording.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-14 Thread Jeff Janes
On Mon, May 14, 2012 at 9:06 AM, Bruce Momjian br...@momjian.us wrote:
 On Sun, May 13, 2012 at 09:01:03PM +0100, Peter Geoghegan wrote:
 On 12 May 2012 01:37, Robert Haas robertmh...@gmail.com wrote:
  Right.  It's not a new feature; it's a performance improvement.  We've
  had group commit for a long time; it just didn't work very well
  before.  And it's not batching the commits better; it's reducing the
  lock contention around realizing that the batched commit has happened.

 This understanding of group commit is technically accurate, but the
 practical implications of the new code are that lots of people benefit
 from group commit, potentially to rather a large degree, where before
 they had exactly no benefit from group commit. We never officially
 called group commit group commit outside of git commit messages
 before. Therefore, it is sort of like we didn't have group commit
 before but now we do, and it's an implementation that's probably as
 effective as that of any of our competitors. It is for that reason
 that I suggested group commit get a more prominent billing, and that
 it actually be officially referred to as group commit. I'm glad that
 the release notes now actually refer to group commit.

 Now, I realise that as one of the authors of the feature I am open to
 accusations of lacking objectivity - clearly it isn't really my place
 to try and influence the feature's placement, and this is the last I
 will say on the matter unless someone else brings it up again. I just
 think that pedantically characterising this as an improvement to our
 existing group commit implementation within a document that will be
 read like a press release is a bad decision, especially since our
 competitors never had a group commit implementation that was far
 inferior to their current implementation. The assumption will be that
 it's a small improvement that's hardly worth noticing at all.

 Thanks for the summary. I know we talk about group commit, but I wasn't
 aware that it had not been exposed to our general users.  I agree we
 need to reword the item as you suggested.  So this group commit happens
 even if users don't change these?

        #commit_delay = 0           # range 0-10, in microseconds
        #commit_siblings = 5            # range 1-1000

If a bunch of people are standing around waiting for a door to unlock
and they are mutually aware of each other, it has never been the case
(or at least not for years) that the first person through the door
would systematically slam it in everyone else's face.  Is this enough
to qualify as group commit?  If so, group commit has always
(again, at least for years) been there.  The new code simply makes it
less likely that the group will trip over each others feet as they all
stream through the door.

The commit_delay settings cover the case where the door unlocks, and
you open it, but then perhaps you stand there for an a few minutes
holding it open in case someone else happens to show up.  This is
pretty much orthogonal to the prior case.  You can not wait for new
people to show up, but trip over the feet of the people already there.
 Or you can wait for new people to show up, then trip over them.  Or
not trip over them, with or without waiting for new arrival.

(For the analogy to work, this particular door refuses to unlock more
than once every 5 minutes.  Maybe it is for a very slow elevator)

 This is the git commit message:

    Make group commit more effective.

    When a backend needs to flush the WAL, and someone else is already flushing
    the WAL, wait until it releases the WALInsertLock and check if we still 
 need
    to do the flush or if the other backend already did the work for us, before
    acquiring WALInsertLock. This helps group commit, because when the WAL 
 flush
    finishes, all the backends that were waiting for it can be woken up in one
    go, and the can all concurrently observe that they're done, rather than
    waking them up one by one in a cascading fashion.

    This is based on a new LWLock function, LWLockWaitUntilFree(), which has
    peculiar semantics. If the lock is immediately free, it grabs the lock and
    returns true. If it's not free, it waits until it is released, but then
    returns false without grabbing the lock. This is used in XLogFlush(), so
    that when the lock is acquired, the backend flushes the WAL, but if it's
    not, the backend first checks the current flush location before retrying.

    Original patch and benchmarking by Peter Geoghegan and Simon Riggs, 
 although
    this patch as committed ended up being very different from that.

    (Heikki Linnakangas)

 Is that commit message inaccurate?

I think the commit message is accurate, other than saying
WALInsertLock where it meant WALWriteLock.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq URL syntax vs SQLAlchemy

2012-05-14 Thread karavelov
- Цитат от Alex Shulgin (a...@commandprompt.com), на 14.05.2012 в 18:16 
-

 Alex a...@commandprompt.com writes:
 
 
 The host part in this case is empty (it is hidden between the // and
 the following /,) thus local socket connection is employed for this
 type of URIs.  To specify non-standard path to the local sockets
 directory use the familiar URI parameter:
 
   postgres:///db?host=/path/to/socket/dir
 

And why are we calling host the parameter that specifies the path for socket
dir - it is not host and could be confused with the  host part of the URI (the 
part between // and /). Why do not call it path ? So it will become:

postgres:///db?path=/path/to/socket/dir

Best regards

--
Luben Karavelov

Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-14 Thread Fujii Masao
On Mon, May 14, 2012 at 6:32 PM, Andres Freund and...@2ndquadrant.com wrote:
 On Friday, May 11, 2012 08:45:23 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Its the only place though which knows whether its actually sensible to
  wakeup the walsender. We could make it return whether it wrote anything
  and do the wakeup at the callers. I count 4 different callsites which
  would be an annoying duplication but I don't really see anything better
  right now.

 Another point here is that XLogWrite is not only normally called with
 the lock held, but inside a critical section.  I see no reason to take
 the risk of doing signal sending inside critical sections.

 BTW, a depressingly large fraction of the existing calls to WalSndWakeup
 are also inside critical sections, generally for no good reason that I
 can see.  For example, in EndPrepare(), why was the call placed where
 it is and not down beside SyncRepWaitForLSN?
 Hm. While I see no real problem moving it out of the lock I don't really see a
 way to cleanly outside critical sections everywhere. The impact of doing so
 seems to be rather big to me. The only externally visible place where it
 actually is known whether we write out data and thus do the wakeup is
 XLogInsert, XLogFlush and XLogBackgroundFlush. The first two of those are
 routinely called inside a critical section.

So what about moving the existing calls of WalSndWakeup() out of a critical
section and adding new call of WalSndWakeup() into XLogBackgroundFlush()?
Then all WalSndWakeup()s are called outside a critical section and after
releasing WALWriteLock. I attached the patch.

Regards,

-- 
Fujii Masao


move_walsndwakeup_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq URL syntax vs SQLAlchemy

2012-05-14 Thread Alex

karave...@mail.bg writes:

 - Цитат от Alex Shulgin (a...@commandprompt.com), на 14.05.2012 в 18:16 
 -

 Alex a...@commandprompt.com writes:
 
 
 The host part in this case is empty (it is hidden between the // and
 the following /,) thus local socket connection is employed for this
 type of URIs.  To specify non-standard path to the local sockets
 directory use the familiar URI parameter:
 
   postgres:///db?host=/path/to/socket/dir
 

 And why are we calling host the parameter that specifies the path for socket
 dir - it is not host and could be confused with the  host part of the URI 
 (the 
 part between // and /). Why do not call it path ? So it will become:

 postgres:///db?path=/path/to/socket/dir

We call it that way since we rely on existing libpq code to interpret
the value of every parameter in the URI (well, almost: with notable
exception of translating ssl=true for JDBC compatibility.)

I don't think anyone would confuse host part of the URI with URI
parameter ?host=... if we care to express things clearly in the
documentation (which we do I believe.)

Existing implementations, like that mentioned by Peter in the top
message of this thread (SQLAlchemy or was it psycopg2?) already use this
notation, so I don't think we can or should do anything about this,
i.e. there's little point in renaming to path or merely supporting it
as an alternative syntax.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog

2012-05-14 Thread Fujii Masao
On Fri, May 11, 2012 at 11:43 PM, Magnus Hagander mag...@hagander.net wrote:
 Should we go down the easy way and just reject connections when the flag is
 mismatching between the client and the server (trivial to do - see the
 attached patch)?

+   char   *tmpparam;

You forgot to add const before char, which causes a compile-time warning.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange issues with 9.2 pg_basebackup replication

2012-05-14 Thread Jim Nasby
On May 13, 2012, at 3:08 PM, Josh Berkus wrote:
 More issues: promoting intermediate standby breaks replication.
 
 To be a bit blunt here, has anyone tested cascading replication *at all*
 before this?

Josh, do you have scripts that you're using to do this testing? If so can you 
post them somewhere?

AFAIK we don't have any regression tests for all this replication stuff, but 
ISTM that we need some...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-14 Thread Peter Eisentraut
On lör, 2012-05-12 at 12:59 -0400, Tom Lane wrote:
 Now it's entirely likely that there is nobody out there relying on
 such a thing, but nonetheless this is a compatibility break, and an
 unnecessary one IMO.  You haven't shown any convincing reason why we
 need to change the behavior of age() on master servers at all.

Recall that this thread originally arose out of age() being called by a
monitoring tool.  It would be nice if repeatedly calling age() on an
otherwise idle database would not change the result.  Currently, you
would never get a stable state on such a check, and moreover, you
would not only get different results but different long-term behavior
between master and standby.  Now this is not that important and can be
accommodated in the respective tools, but it is kind of weird.  It would
be like a check for disk space losing one byte at every check, even if
you got it back later.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-14 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On lör, 2012-05-12 at 12:59 -0400, Tom Lane wrote:
 Now it's entirely likely that there is nobody out there relying on
 such a thing, but nonetheless this is a compatibility break, and an
 unnecessary one IMO.  You haven't shown any convincing reason why we
 need to change the behavior of age() on master servers at all.

 Recall that this thread originally arose out of age() being called by a
 monitoring tool.  It would be nice if repeatedly calling age() on an
 otherwise idle database would not change the result.  Currently, you
 would never get a stable state on such a check, and moreover, you
 would not only get different results but different long-term behavior
 between master and standby.

Hm.  Interesting argument, but why exactly would you expect that age()
would work differently from, say, wall clock time?  And how likely is it
that a database that requires monitoring is going to have exactly zero
transactions over a significant length of time?

(In any case, my primary beef at the moment is not with whether it's a
good idea to change age()'s behavior going forward, but rather with
having back-patched such a change.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-14 Thread Peter Geoghegan
On 14 May 2012 17:06, Bruce Momjian br...@momjian.us wrote:
 So this group commit happens
 even if users don't change these?

        #commit_delay = 0           # range 0-10, in microseconds
        #commit_siblings = 5            # range 1-1000

Yes, that's right - the new group commit is not configurable, and is
used all the time. I believe that very few people ever change these
other settings in production, because it is very risky to tweak them.
Read the description of them in Greg Smith's book for a good example
of what I mean, or read the docs - it is rare that adding delay by
increasing this parameter will actually improve performance. There
may actually be no one that's set commit_delay  0 at all. All these
settings do is insert a sleep of commit_delay microseconds, in the
hope that the call to XLogFlush() will then find that doing work is
unnecessary due to some other session having got there first.

 So the new release item wording will be:

        Add group commit capability for sessions that commit at the same
        time

I'd say that's almost what I'd like to see, because commit_delay falls
far short of the definition of group commit established by other
RDBMSs, which is, I assume, why the term was never used for advocacy
purposes. The old facility could result in batching of commits
specifically by artificially adding latency so that after the delay
the sessions found they could fastpath. While it worked under
artificial conditions, I think it resulted in relatively small
improvements next to new group commit's order-of-magnitude increase in
throughput that was demonstrated for a maximally commit-bound
workload.

The mere ability to notice that an XLogFlush() call is unnecessary and
fastpath out could be argued to be an aboriginal group commit,
predating even commit_delay, as could skipping duplicate fsync()
requests in XLogWrite(), which I think Jeff pointed out, but I don't
think anyone actually takes this position.

What I'd suggest is something like:


Add group commit capability so that when a session commits, other,
concurrent sessions with pending commits will later automatically
check if their commit has been satisfied by the original request (or
even some other, intervening request) before actually proceeding.

This results in a large increase in transaction throughput for
commit-bound workloads, as under these circumstances the actual number
of flush requests will be considerably lower than that of earlier
versions of PostgreSQL.


Now, granted, the number of actual flush requests being so high wasn't
a problem because of any actual number of follow-through duplicate
fsync() system calls (or writes) - XLogWrite() notices them, and
probably has forever (although I read suggestion that some MySQL
flavours might have actually been stupid enough to do so pre group
commit). But in order to be able to notice this, backends previously
had to get the WALWriteLock exclusively. I don't think that these are
the kind of qualifications that belong here though. People rightfully
only care about the practical implications, and whether or not we're
competitive in various respects, such as whether or not we tick the
group commit box, and whether or not we have pretty graphs that are
comparable to those of other database systems.

 This is the git commit message:

    Make group commit more effective.

    When a backend needs to flush the WAL, and someone else is already flushing
    the WAL, wait until it releases the WALInsertLock and check if we still 
 need
    to do the flush or if the other backend already did the work for us, before
    acquiring WALInsertLock. This helps group commit, because when the WAL 
 flush
    finishes, all the backends that were waiting for it can be woken up in one
    go, and the can all concurrently observe that they're done, rather than
    waking them up one by one in a cascading fashion.

    This is based on a new LWLock function, LWLockWaitUntilFree(), which has
    peculiar semantics. If the lock is immediately free, it grabs the lock and
    returns true. If it's not free, it waits until it is released, but then
    returns false without grabbing the lock. This is used in XLogFlush(), so
    that when the lock is acquired, the backend flushes the WAL, but if it's
    not, the backend first checks the current flush location before retrying.

    Original patch and benchmarking by Peter Geoghegan and Simon Riggs, 
 although
    this patch as committed ended up being very different from that.

    (Heikki Linnakangas)

 Is that commit message inaccurate?

I wouldn't say that it's inaccurate. However, if we were to deprecate
commit_delay and commit_siblings, that would be a mere matter of
removing this code (and the GUC stuff itself):

if (CommitDelay  0  enableFsync 
   MinimumActiveBackends(CommitSiblings))
   pg_usleep(CommitDelay);

This code is just before one call (of several) to XLogFlush(). The
guts of the new group commit are in that 

Re: [HACKERS] Draft release notes complete

2012-05-14 Thread Jeff Janes
On Mon, May 14, 2012 at 9:56 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Mon, May 14, 2012 at 9:06 AM, Bruce Momjian br...@momjian.us wrote:

 This is the git commit message:

    Make group commit more effective.

    When a backend needs to flush the WAL, and someone else is already 
 flushing
    the WAL, wait until it releases the WALInsertLock and check if we still 
 need
    to do the flush or if the other backend already did the work for us, 
 before
    acquiring WALInsertLock. This helps group commit, because when the WAL 
 flush
    finishes, all the backends that were waiting for it can be woken up in one
    go, and the can all concurrently observe that they're done, rather than
    waking them up one by one in a cascading fashion.

    This is based on a new LWLock function, LWLockWaitUntilFree(), which has
    peculiar semantics. If the lock is immediately free, it grabs the lock and
    returns true. If it's not free, it waits until it is released, but then
    returns false without grabbing the lock. This is used in XLogFlush(), so
    that when the lock is acquired, the backend flushes the WAL, but if it's
    not, the backend first checks the current flush location before retrying.

    Original patch and benchmarking by Peter Geoghegan and Simon Riggs, 
 although
    this patch as committed ended up being very different from that.

    (Heikki Linnakangas)

 Is that commit message inaccurate?

 I think the commit message is accurate, other than saying
 WALInsertLock where it meant WALWriteLock.

Sorry, wrong number of negations.  I think the commit message is
accurate, other than

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-14 Thread Peter Eisentraut
On mån, 2012-05-14 at 15:11 -0400, Tom Lane wrote:
 Hm.  Interesting argument, but why exactly would you expect that age()
 would work differently from, say, wall clock time?  And how likely is
 it that a database that requires monitoring is going to have exactly
 zero transactions over a significant length of time?

Yes, it will be a marginal case in practice, but it's something that a
curious DBA might wonder about.  But I think your example how age()
behaves relative to an INSERT statement is more important.
 
 (In any case, my primary beef at the moment is not with whether it's a
 good idea to change age()'s behavior going forward, but rather with
 having back-patched such a change.)

Certainly we should leave it alone there.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-14 Thread Merlin Moncure
On Wed, May 9, 2012 at 10:11 PM, Bruce Momjian br...@momjian.us wrote:
 I have completed my draft of the 9.2 release notes, and committed it to
 git.

The beta release announcement is on postgresql.org with a direct link
to the release notes.  The notes lead off with:

NARRATIVE HERE. Major enhancements include:

MAJOR LIST HERE

That looks a little unfinished -- this is exactly where many people
land to see what's coming with the new release.  Need some help
putting together the major feature list?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-14 Thread Simon Riggs
On 14 May 2012 20:05, Peter Eisentraut pete...@gmx.net wrote:
 On lör, 2012-05-12 at 12:59 -0400, Tom Lane wrote:
 Now it's entirely likely that there is nobody out there relying on
 such a thing, but nonetheless this is a compatibility break, and an
 unnecessary one IMO.  You haven't shown any convincing reason why we
 need to change the behavior of age() on master servers at all.

 Recall that this thread originally arose out of age() being called by a
 monitoring tool.  It would be nice if repeatedly calling age() on an
 otherwise idle database would not change the result.  Currently, you
 would never get a stable state on such a check, and moreover, you
 would not only get different results but different long-term behavior
 between master and standby.

That's how it works now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch pg_is_in_backup()

2012-05-14 Thread Gilles Darold
Sorry for the the double post but it seems that my previous reply
doesn't reach the pgsql-hacker list. So here is the new patches that
limit lines to 80 characters.

Regards,

Le 02/05/2012 19:53, Gabriele Bartolini a écrit :
 Hi Gilles,

Sorry for the delay.

 Il 03/04/12 14:21, Gilles Darold ha scritto:
 +1, this is also my point of view.

I have looked at the patch that contains both pg_is_in_backup() and
 pg_backup_start_time().

From a functional point of view it looks fine to me. I was thinking
 of adding the BackupInProgress() at the beginning of
 pg_backup_start_time(), but the AllocateFile() function already make
 sure the file exists.

I have performed some basic testing of both functions and tried to
 inject invalid characters in the start time field of the backup_label
 file and it is handled (with an exception) by the server. Cool.

I spotted though some formatting issues, in particular indentation
 and multi-line comments. Some rows are longer than 80 chars.

Please resubmit with these cosmetic changes and it is fine with me.
 Thank you.

 Cheers,
 Gabriele



-- 
Gilles Darold
Administrateur de bases de données
http://dalibo.com - http://dalibo.org

diff -ru postgresql-head/doc/src/sgml/func.sgml postgresql/doc/src/sgml/func.sgml
--- postgresql-head/doc/src/sgml/func.sgml	2012-04-03 10:44:43.979702643 +0200
+++ postgresql/doc/src/sgml/func.sgml	2012-04-03 11:21:15.075701574 +0200
@@ -14467,6 +14467,9 @@
 primarypg_stop_backup/primary
/indexterm
indexterm
+primarypg_backup_start_time/primary
+   /indexterm
+   indexterm
 primarypg_switch_xlog/primary
/indexterm
indexterm
@@ -14532,6 +14535,13 @@
   /row
   row
entry
+literalfunctionpg_backup_start_time()/function/literal
+/entry
+   entrytypetimestamp with time zone/type/entry
+   entryGet start time of an online exclusive backup in progress./entry
+  /row
+  row
+   entry
 literalfunctionpg_switch_xlog()/function/literal
 /entry
entrytypetext/type/entry
diff -ru postgresql-head/src/backend/access/transam/xlogfuncs.c postgresql/src/backend/access/transam/xlogfuncs.c
--- postgresql-head/src/backend/access/transam/xlogfuncs.c	2012-04-03 10:44:44.227702645 +0200
+++ postgresql/src/backend/access/transam/xlogfuncs.c	2012-04-03 11:21:15.075701574 +0200
@@ -29,6 +29,7 @@
 #include utils/numeric.h
 #include utils/guc.h
 #include utils/timestamp.h
+#include storage/fd.h
 
 
 static void validate_xlog_location(char *str);
@@ -563,3 +564,67 @@
 
 	PG_RETURN_NUMERIC(result);
 }
+
+/*
+ * Returns start time of an online exclusive backup.
+ *
+ * When there's no exclusive backup in progress, the function
+ * returns NULL.
+ */
+Datum
+pg_backup_start_time(PG_FUNCTION_ARGS)
+{
+	TimestampTzxtime;
+	FILE*lfp;
+	charfline[MAXPGPATH];
+	char   backup_start_time[30];
+
+	/*
+	* See if label file is present
+	*/
+	lfp = AllocateFile(BACKUP_LABEL_FILE, r);
+	if (lfp == NULL)
+	{
+	   if (errno != ENOENT)
+		   ereport(ERROR,
+(errcode_for_file_access(),
+errmsg(could not read file \%s\: %m,
+	BACKUP_LABEL_FILE)));
+	   PG_RETURN_NULL();
+	}
+
+	/*
+	 * Parse the file to find the the START TIME line.
+	 */
+	backup_start_time[0] = '\0';
+	while (fgets(fline, sizeof(fline), lfp) != NULL)
+	{
+	   if (sscanf(fline, START TIME: %25[^\n]\n,
+		backup_start_time) == 1)
+			break;
+	}
+
+	/*
+	* Close the backup label file.
+	*/
+	if (ferror(lfp) || FreeFile(lfp)) {
+		ereport(ERROR,
+			(errcode_for_file_access(),
+			errmsg(could not read file \%s\: %m,
+			BACKUP_LABEL_FILE)));
+	}
+	
+	if (strlen(backup_start_time) == 0) {
+		ereport(ERROR,
+			(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+			errmsg(invalid data in file \%s\,
+			BACKUP_LABEL_FILE)));
+	}
+
+	/*
+	* Convert the time string read from file to TimestampTz form.
+	*/
+	xtime = DatumGetTimestampTz(
+		DirectFunctionCall3(timestamptz_in,
+			CStringGetDatum(backup_start_time),
+			ObjectIdGetDatum(InvalidOid),Int32GetDatum(-1))
+	);
+
+	PG_RETURN_TIMESTAMPTZ(xtime);
+}
+
diff -ru postgresql-head/src/include/access/xlog_internal.h postgresql/src/include/access/xlog_internal.h
--- postgresql-head/src/include/access/xlog_internal.h	2012-04-03 10:44:45.051702642 +0200
+++ postgresql/src/include/access/xlog_internal.h	2012-04-03 11:21:15.075701574 +0200
@@ -282,5 +282,6 @@
 extern Datum pg_xlog_replay_resume(PG_FUNCTION_ARGS);
 extern Datum pg_is_xlog_replay_paused(PG_FUNCTION_ARGS);
 extern Datum pg_xlog_location_diff(PG_FUNCTION_ARGS);
+extern Datum pg_backup_start_time(PG_FUNCTION_ARGS);
 
 #endif   /* XLOG_INTERNAL_H */
diff -ru postgresql-head/src/include/catalog/pg_proc.h postgresql/src/include/catalog/pg_proc.h
--- postgresql-head/src/include/catalog/pg_proc.h	2012-04-03 10:44:45.051702642 +0200
+++ postgresql/src/include/catalog/pg_proc.h	2012-04-03 11:21:15.075701574 +0200

Re: [HACKERS] Bugs in our Windows socket code

2012-05-14 Thread Tom Lane
I wrote:
 I've been trying to figure out why my recent attempt to latch-ify the
 stats collector didn't work well on the Windows buildfarm machines.
 ...
 What I intend to try doing about this is making
 pgwin32_waitforsinglesocket detach its event object from the socket before
 returning, ie WSAEventSelect(socket, NULL, 0).  This will hopefully
 decouple it a bit better from WaitLatchOrSocket.

Well, I tried that, and it made things better on some of the Windows
buildfarm critters, but not all.  I then added some debug printouts
to pgstat.c, and after a few go-rounds with that I have learned the
following things:

* In the original state of the latch-ified stats collector patch,
control never actually reaches the WaitLatchOrSocket call at all,
until the end of the regression test run when the postmaster sends
a shutdown signal.  pgstat.c only iterates the inner loop containing
the recv() call, because the forced blocking behavior in pgwin32_recv
causes it to wait at the recv() call until data is available.  However,
on the machines where the regression tests fails, somewhere during
the proceedings recv() just stops returning at all.  This can be seen
for example at
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=narwhaldt=2012-05-14%2004%3A00%3A06
Now the really interesting point here is that given that
WaitLatchOrSocket isn't being reached, there is hardly any difference
between the pre-patched code and the patched code: before, we called 
pgwin32_waitforsinglesocket from pgstat.c and then did a recv() when
it said OK, and in the patched code a similar call happens internally
to pgwin32_recv.  As far as I can see, the only material difference
there is that pgstat.c specified a finite timeout when calling
pgwin32_waitforsinglesocket while pgwin32_recv uses an infinite timeout.
I now believe that the only reason the pre-patched code appeared to work
is that sometimes it would time out and retry.  With the infinite
timeout in place, after whatever goes wrong goes wrong, it just hangs up
until a signal event arrives.

* I then modified pgstat.c to force pgwin32_noblock on while calling
pgwin32_recv().  This caused pgstat's new loop logic to start behaving
as intended, that is it would loop calling recv() as long as data was
available and then fall out to the WaitLatchOrSocket call.  However, it
was still capable of randomly freezing at the WaitLatchOrSocket call,
as in
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=narwhaldt=2012-05-14%2016%3A00%3A05
So whatever is wrong affects both the code in
pgwin32_waitforsinglesocket and that in WaitLatchOrSocket.

* Getting desperate, I then changed the WaitLatchOrSocket call to
specify a two-second timeout, which is the same as we had been using in
the pgwin32_waitforsinglesocket call before all this started.  And
behold, narwhal passes the regression tests again:
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=narwhaldt=2012-05-14%2022%3A00%3A05stg=check
This trace shows positively that the stats collector occasionally fails
to receive incoming data, causing it to stop dead until a timeout occurs
(look for pgstat: wait result 0x8 log entries).  After that it's again
able to receive data.

My conclusion is that the WSAEventSelect coding pattern we're using
tickles something seriously broken in Windows Server 2003, causing it
to sometimes lose FD_READ events.  The other flavors of Windows we have
in the buildfarm seem to perform more nearly according to Microsoft's
documentation.

Having already spent way more time on this than I cared to, my intention
is to use the hack solution of making pgstat.c call WaitLatchOrSocket
with a two-second timeout on Windows only, while using an infinite
timeout elsewhere.  I don't feel too awful about this because anybody
who's expecting minimal power consumption needs to be using something
other than Windows anyway.  However, I'm now more convinced than ever
that our Windows socket code is rickety to the point that it will fall
over anytime somebody looks at it cross-eyed.  I think that we need to
get rid of the technique of detaching and re-attaching event objects to
sockets; at least some versions of Windows just don't work with that.
And we definitely need to get rid of that global variable
pgwin32_noblock.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers