Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Michael Nolan
On Fri, May 11, 2012 at 10:03 PM, Tom Lane  wrote:

>
> Well, the question to me is exactly how much good it will do to stop
> deletion of the pg_tablespace entry, if the underlying files are gone.
> I'm having a hard time getting excited about expending cycles on that.
>

There could be multiple reasons why the underlying files are not there,
such as a filesystem that isn't currently mounted for some reason.

It seems prudent to throw an error on drop tablespace if there are
references to that tablespace in the catalog, or perhaps require a 'force'
clause to override any errors, but it probably isn't something most DBAs
would run into very often.

Thanks for figuring it out, Tom.
--
MIke Nolan


Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from Robert Haas's message of vie may 11 20:28:28 -0400 2012:
>> I'm astonished we don't do that already.  Seems inconsistent with
>> other SQL object types - most obviously, schemas - and a potentially
>> giant foot-gun.

> The original patch did contain tablespace tracking (though I don't
> remember considering whether they were default or not), but it got
> ripped out because during the subsequent discussion we considered that
> it wasn't necessary to keep track of it -- supposedly, whenever you were
> going to delete a tablespace, the existing files in the directory would
> be sufficient evidence to stop the deletion.  Evidently I failed to
> consider the case at hand.

Well, the question to me is exactly how much good it will do to stop
deletion of the pg_tablespace entry, if the underlying files are gone.
I'm having a hard time getting excited about expending cycles on that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Alvaro Herrera

Excerpts from Robert Haas's message of vie may 11 20:28:28 -0400 2012:
> On Fri, May 11, 2012 at 7:55 PM, Tom Lane  wrote:
> > A larger question is whether we should start making pg_shdepend entries
> > for table/index usage of non-default tablespaces, so that you couldn't
> > DROP a tablespace that the catalogs think still has tables/indexes in
> > it.
> 
> I'm astonished we don't do that already.  Seems inconsistent with
> other SQL object types - most obviously, schemas - and a potentially
> giant foot-gun.

The original patch did contain tablespace tracking (though I don't
remember considering whether they were default or not), but it got
ripped out because during the subsequent discussion we considered that
it wasn't necessary to keep track of it -- supposedly, whenever you were
going to delete a tablespace, the existing files in the directory would
be sufficient evidence to stop the deletion.  Evidently I failed to
consider the case at hand.

I don't think there's any particular reason we can't put it back.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Robert Haas
On Fri, May 11, 2012 at 2:03 PM, Jeff Janes  wrote:
> On Thu, May 10, 2012 at 10:44 AM, Bruce Momjian  wrote:
>> On Thu, May 10, 2012 at 01:11:54PM +0100, Peter Geoghegan wrote:
>>
>>> Why can't we call group commit group commit (and for that matter,
>>> index-only scans index-only scans), so that people will understand
>>> that we are now competitive with other RDBMSs in this area? "Improve
>>> performance of WAL writes when multiple transactions commit at the
>>> same time" seems like a pretty bad description, since it doesn't make
>>> any reference to batching of commits.  Also, I don't think that the
>>
>> I didn't call it "group commit" because we have settings we used to
>> regard as group commit:
>
> My understanding of that patch is that is does not cause "group
> commit" to happen, but rather when a group commit does happen
> "naturally" it causes all members of the group to awaken more
> quickly/efficiently than they previously would have.

Right.  It's not a new feature; it's a performance improvement.  We've
had group commit for a long time; it just didn't work very well
before.  And it's not batching the commits better; it's reducing the
lock contention around realizing that the batched commit has happened.

>> I updated the release docs to call the item "group commit" because I now
>> don't see any reference to that term in our docs.
>
> I don't think I'd mention WAL writing at all, and just say that it
> improves the concurrency of locking around group commits.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Robert Haas
On Fri, May 11, 2012 at 7:55 PM, Tom Lane  wrote:
> A larger question is whether we should start making pg_shdepend entries
> for table/index usage of non-default tablespaces, so that you couldn't
> DROP a tablespace that the catalogs think still has tables/indexes in
> it.

I'm astonished we don't do that already.  Seems inconsistent with
other SQL object types - most obviously, schemas - and a potentially
giant foot-gun.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-11 Thread Robert Haas
On Fri, May 11, 2012 at 2:17 PM, Tom Lane  wrote:
> The merit would be in keeping the function's definition simple.

True.  It's not *much* simpler, but it is simpler.

> Anyway, let's see if breaking this down by cases clarifies anything.
> As I see it, there are three possible cases:
>
> 1. On master, xact already has an XID.  The longstanding behavior is
> to use that XID as reference.  The committed patch changes this to
> reference whatever is next-to-assign XID at first call of age(), but
> it's far from clear to me that that's better for this case in isolation.
>
> 2. On master, xact does not (yet) have an XID.  The previous behavior
> is to force XID assignment at first call of age().  However, if we
> capture ReadNewTransactionId as per patch then we give the same answer
> as we would have done before, only without assigning the xact an XID.
> It could be argued that this can yield inconsistent results if the xact
> later does something that forces XID assignment anyway, but surely
> that's a pretty narrow corner case.
>
> 3. On slave, so xact cannot have an XID.  Previous behavior is to fail
> which we all agree is unhelpful.  Capturing ReadNewTransactionId
> provides behavior somewhat similar to patched case #2, though it's
> unclear to me exactly how compatible it is given the likely skew between
> master and slave notions of the next XID.
>
> It's arguable that what we should do is "use XID if on master, capture
> ReadNewTransactionId if on slave", which would avoid any backwards
> incompatibility for the first two cases while still fixing the case that
> everybody agrees is a problem.  Simon argues that this gives a weird
> variance in the master vs slave behavior, but I'm not sure I believe
> that's an issue.  In case 2, the only way that the user can tell the
> difference between force-XID-assignment and capture-ReadNewTransactionId
> is if the transaction later does something requiring an XID, which
> cannot happen anyway on the slave.  So from here the difference in these
> behaviors seems minimal and not worth creating incompatibility in the
> first two cases for.

Yeah.  I don't think I particularly care what we do in HEAD, but it
sure seems like it would be nice to change the back-branch behavior as
little as possible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Tom Lane
"Albe Laurenz"  writes:
> Michael Nolan wrote:
>> I see one potential difference between your results and mine.
>> When I rebuild the tablespace, I wind up with the same filename/OID as
>> before, I'm not sure you do.

> Right. That's strange.
> Usually OIDs get incremented, so you shouldn't end up with the same
> OID for the new tablespace.

I believe I see what's happening here, and the difference is that
Michael deleted the tablespace's directory while Albe only deleted the
files in it.

The former case causes destroy_tablespace_directories to exit early,
without throwing a hard error, and without having removed the symlink
for the tablespace OID in $PGDATA/pg_tblspc/.  This means that after
re-creating a new tablespace in the same directory location, that old
symlink works again, even though it no longer corresponds to any OID in
pg_tablespace.  Thus, Michael doesn't see an error in his REINDEX;
physical access to the index still works even though the index's
reltablespace is no longer really valid.  The reported symptom of \d
not showing the tablespace is because the code in psql's describe.c will
silently ignore a reltablespace entry that does not match any OID in
pg_tablespace.

We could prevent this scenario if we changed
destroy_tablespace_directories so that for any non-fatal-error
situation, it continues to march on and try to destroy the remaining
infrastructure, particularly the symlink.  I'm not sure that's really a
good idea, but it definitely seems to be a bad idea to leave the symlink
in place when we're removing the pg_tablespace row.  Alternatively we
could make more of those cases be errors rather than warnings, so that
the pg_tablespace row removal would be rolled back.

The comment in destroy_tablespace_directories indicates that what we're
trying to support by not throwing an error for missing directory is
cleaning up a dead pg_tablespace row, which suggests that removing the
symlink too would be reasonable.

A larger question is whether we should start making pg_shdepend entries
for table/index usage of non-default tablespaces, so that you couldn't
DROP a tablespace that the catalogs think still has tables/indexes in
it.  I'm not sure that that'd be particularly helpful though.  It
certainly wouldn't do anything to protect against the scenario discussed
here of an external agency zapping all the files.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP: parameterized function scan

2012-05-11 Thread Antonin Houska

Hello,
following this short discussion
http://archives.postgresql.org/message-id/4f5aa202.9020...@gmail.com
I gave it one more try and hacked the optimizer so that function can 
become an inner relation in NL join, parametrized with values from the 
outer relation.


I tried to explain my thoughts in comments. Other than that:

1. I haven't tried to use SpecialJoinInfo to constrain join order. Even 
if the order matters in query like

SELECT * from a, func(a.i)
it's still inner join by nature. SpecialJoinInfo is used for INNER join 
rarely, but never stored in PlannerInfo. Doing so only for these lateral 
functions would be rather disruptive.


2. Simple SQL function (i.e. one that gets pulled-up into the main 
query) is a special case. The query that results from such a pull-up no 
longer contains any function (and thus is not affected by this patch) 
but such cases need to be newly taken into account and examined / tested 
(the patch unblocks them at parsing stage too).


3. There might be some open questions about SQL conformance.

I've spent quite a while looking into the optimizer code and after all I 
was surprised that it didn't require that many changes. At least to make 
few simple examples work. Do I ignore any important fact(s) ?


Thanks,
Tony.


diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 446319d..f30ae4e 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -85,6 +85,44 @@ add_paths_to_joinrel(PlannerInfo *root,
 	SemiAntiJoinFactors semifactors;
 	Relids		param_source_rels = NULL;
 	ListCell   *lc;
+	bool nest_loop_only = false;
+
+	/*
+	 * Should the outer path be parametrized by the inner, there's no method
+	 * to evaluate such a join.
+	 */
+	if (bms_overlap(outerrel->func_arg_relids, innerrel->relids)) {
+		return;
+	}
+
+	/*
+	 * Only nest-loop join is considered viable method for lateral function.
+	 *
+	 * Non-empty 'innerrel->func_arg_relids' isn't a sufficient condition for
+	 * NL join with parametrized function on the inner side:
+	 *
+	 * If there's no overlap, the function arguments have already been used
+	 * to construct the innerrel (no reason to use them again) or they are supplied
+	 * from higher level (and thus constant for the current join) .
+	 */
+	if (bms_overlap(innerrel->func_arg_relids, outerrel->relids)) {
+		/*
+		 * Merge join is the only method to evaluate full join, but merge join is a bad option
+		 * for lateral functions.
+		 */
+		if (jointype == JOIN_FULL) {
+			return;
+		}
+
+		/*
+		 * TODO
+		 * Probably not relevant. Verify.
+		 */
+		if (jointype == JOIN_SEMI || jointype == JOIN_ANTI) {
+			return;
+		}
+		nest_loop_only = true;
+	}
 
 	/*
 	 * Find potential mergejoin clauses.  We can skip this if we are not
@@ -92,7 +130,7 @@ add_paths_to_joinrel(PlannerInfo *root,
 	 * way of implementing a full outer join, so override enable_mergejoin if
 	 * it's a full join.
 	 */
-	if (enable_mergejoin || jointype == JOIN_FULL)
+	if ((enable_mergejoin || jointype == JOIN_FULL) && !nest_loop_only)
 		mergeclause_list = select_mergejoin_clauses(root,
 	joinrel,
 	outerrel,
@@ -151,7 +189,7 @@ add_paths_to_joinrel(PlannerInfo *root,
 	 * 1. Consider mergejoin paths where both relations must be explicitly
 	 * sorted.	Skip this if we can't mergejoin.
 	 */
-	if (mergejoin_allowed)
+	if (mergejoin_allowed && !nest_loop_only)
 		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
 			 restrictlist, mergeclause_list, jointype,
 			 sjinfo, param_source_rels);
@@ -192,10 +230,13 @@ add_paths_to_joinrel(PlannerInfo *root,
 	 * before being joined.  As above, disregard enable_hashjoin for full
 	 * joins, because there may be no other alternative.
 	 */
-	if (enable_hashjoin || jointype == JOIN_FULL)
+	if ((enable_hashjoin || jointype == JOIN_FULL) && !nest_loop_only)
 		hash_inner_and_outer(root, joinrel, outerrel, innerrel,
 			 restrictlist, jointype,
 			 sjinfo, &semifactors, param_source_rels);
+
+	/* Prepare for check of function parametrization at the next higher level. */
+	joinrel->func_arg_relids = bms_union(outerrel->func_arg_relids, innerrel->func_arg_relids);
 }
 
 /*
@@ -654,6 +695,7 @@ match_unsorted_outer(PlannerInfo *root,
 	Path	   *inner_cheapest_total = innerrel->cheapest_total_path;
 	Path	   *matpath = NULL;
 	ListCell   *lc1;
+	bool nest_loop_only = bms_overlap(innerrel->func_arg_relids, outerrel->relids);
 
 	/*
 	 * Nestloop only supports inner, left, semi, and anti joins.  Also, if we
@@ -814,6 +856,9 @@ match_unsorted_outer(PlannerInfo *root,
 		if (save_jointype == JOIN_UNIQUE_OUTER)
 			continue;
 
+		if (nest_loop_only)
+			continue;
+
 		/* Look for useful mergeclauses (if any) */
 		mergeclauses = find_mergeclauses_for_pathkeys(root,
 	  outerpath->pathkeys,
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c34b9b8..5552fa8 100644
---

Re: [HACKERS] checkpointer code behaving strangely on postmaster -T

2012-05-11 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from Tom Lane's message of vie may 11 16:50:01 -0400 2012:
>> I'm confused about what you did here and whether this isn't just pilot
>> error.

> The sequence of events is:
> postmaster -T
> crash a backend
> SIGINT postmaster
> SIGCONT all child processes

> My expectation is that postmaster should exit normally after this.

Well, my expectation is that the postmaster should wait for the children
to finish dying, and then exit rather than respawn anything.  It is not
on the postmaster's head to make them die anymore, because it already
(thinks it) sent them SIGQUIT.  Using SIGCONT here is pilot error.

> Maybe we can consider this to be just pilot error, but then why do all
> other processes exit normally?

The reason for that is that the postmaster's SIGINT interrupt handler
(lines 2163ff) sent them SIGTERM, without bothering to notice that we'd
already sent them SIGQUIT/SIGSTOP; so once you CONT them they get the
SIGTERM and drop out normally.  That handler knows it should not signal
the checkpointer yet, so the checkpointer doesn't get the memo.  But the
lack of a FatalError check here is just a simplicity of implementation
thing; it should not be necessary to send any more signals once we are
in FatalError state.  Besides, this behavior is all wrong for a crash
recovery scenario: there is no guarantee that shared memory is in good
enough condition for SIGTERM shutdown to work.  And we *definitely*
don't want the checkpointer trying to write a shutdown checkpoint.

>> So I don't see any bug here.  And, after closer inspection, your
>> previous proposed patch is quite bogus because checkpointer is not
>> supposed to stop yet when the other processes are being terminated
>> normally.

> Well, it does send the signal only when FatalError is set.  So it should
> only affect -T behavior.

If FatalError is set, it should not be necessary to send any more
signals, period, because we already tried to kill every child.  If we
need to defend against somebody using SIGSTOP/SIGCONT inappropriately,
it would take a lot more thought (and code) than this, and it would
still be extremely fragile because a SIGCONT'd backend is going to be
executing against possibly-corrupt shared memory.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer code behaving strangely on postmaster -T

2012-05-11 Thread Alvaro Herrera

Excerpts from Tom Lane's message of vie may 11 16:50:01 -0400 2012:

> > Yep, it's still there as far as I can tell.  A backtrace from the
> > checkpointer shows it's waiting on the latch.
> 
> I'm confused about what you did here and whether this isn't just pilot
> error.  If you run with -T then the postmaster will just SIGSTOP the
> remaining child processes, but then it will sit and wait for them to
> die, since the state machine expects them to react as though they'd been
> sent SIGQUIT.

The sequence of events is:
postmaster -T
crash a backend
SIGINT postmaster
SIGCONT all child processes

My expectation is that postmaster should exit normally after this.  What
happens instead is that all processes exit, except checkpointer.  And in
fact, postmaster is now in PM_WAIT_BACKENDS state, so sending SIGINT a
second time will not shutdown checkpointer either.

Maybe we can consider this to be just pilot error, but then why do all
other processes exit normally?  To me it just seems an oversight in
checkpointer shutdown handling in conjuction with -T.

> If you SIGCONT any of them then that process will resume,
> totally ignorant that it's supposed to die.  So "kill -CONTd, of course"
> makes no sense to me.  I tried killing one child with -KILL, then
> sending SIGINT to the postmaster, then killing the remaining
> already-stopped children, and the postmaster did exit as expected after
> the last child died.

Uhm, after you SIGINTd postmaster didn't it shutdown all children?  That
would be odd.

> So I don't see any bug here.  And, after closer inspection, your
> previous proposed patch is quite bogus because checkpointer is not
> supposed to stop yet when the other processes are being terminated
> normally.

Well, it does send the signal only when FatalError is set.  So it should
only affect -T behavior.

> Possibly it'd be useful to teach the postmaster more thoroughly about
> SIGSTOP and have a way for it to really kill the remaining children
> after you've finished investigating their state.  But frankly this
> is the first time I've heard of anybody using that feature at all;
> I always thought it was a vestigial hangover from days when the kernel
> was too stupid to write separate core dump files for each backend.
> I'd rather remove SendStop than add more complexity there.

Hah.  I've used it a few times, but I can see that multiple core files
are okay.  Maybe you're right and we should just remove it.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer code behaving strangely on postmaster -T

2012-05-11 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from Tom Lane's message of jue may 10 02:27:32 -0400 2012:
>> Alvaro Herrera  writes:
>>> I noticed while doing some tests that the checkpointer process does not
>>> recover very nicely after a backend crashes under postmaster -T (after
>>> all processes have been kill -CONTd, of course, and postmaster told to
>>> shutdown via Ctrl-C on its console).  For some reason it seems to get
>>> stuck on a loop doing sleep(0.5s)  In other case I caught it trying to
>>> do a checkpoint, but it was progressing a single page each time and then
>>> sleeping.  In that condition, the checkpoint took a very long time to
>>> finish.

>> Is this still a problem as of HEAD?  I think I've fixed some issues in
>> the checkpointer's outer loop logic, but not sure if what you saw is
>> still there.

> Yep, it's still there as far as I can tell.  A backtrace from the
> checkpointer shows it's waiting on the latch.

I'm confused about what you did here and whether this isn't just pilot
error.  If you run with -T then the postmaster will just SIGSTOP the
remaining child processes, but then it will sit and wait for them to
die, since the state machine expects them to react as though they'd been
sent SIGQUIT.  If you SIGCONT any of them then that process will resume,
totally ignorant that it's supposed to die.  So "kill -CONTd, of course"
makes no sense to me.  I tried killing one child with -KILL, then
sending SIGINT to the postmaster, then killing the remaining
already-stopped children, and the postmaster did exit as expected after
the last child died.

So I don't see any bug here.  And, after closer inspection, your
previous proposed patch is quite bogus because checkpointer is not
supposed to stop yet when the other processes are being terminated
normally.

Possibly it'd be useful to teach the postmaster more thoroughly about
SIGSTOP and have a way for it to really kill the remaining children
after you've finished investigating their state.  But frankly this
is the first time I've heard of anybody using that feature at all;
I always thought it was a vestigial hangover from days when the kernel
was too stupid to write separate core dump files for each backend.
I'd rather remove SendStop than add more complexity there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Simon Riggs
On 11 May 2012 19:45, Tom Lane  wrote:
> Andres Freund  writes:
>> Its the only place though which knows whether its actually sensible to wakeup
>> the walsender. We could make it return whether it wrote anything and do the
>> wakeup at the callers. I count 4 different callsites which would be an
>> annoying duplication but I don't really see anything better right now.
>
> Another point here is that XLogWrite is not only normally called with
> the lock held, but inside a critical section.  I see no reason to take
> the risk of doing signal sending inside critical sections.
>
> BTW, a depressingly large fraction of the existing calls to WalSndWakeup
> are also inside critical sections, generally for no good reason that I
> can see.  For example, in EndPrepare(), why was the call placed where
> it is and not down beside SyncRepWaitForLSN?

I think because nobody thought of that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Tom Lane
Andres Freund  writes:
> Its the only place though which knows whether its actually sensible to wakeup 
> the walsender. We could make it return whether it wrote anything and do the 
> wakeup at the callers. I count 4 different callsites which would be an 
> annoying duplication but I don't really see anything better right now.

Another point here is that XLogWrite is not only normally called with
the lock held, but inside a critical section.  I see no reason to take
the risk of doing signal sending inside critical sections.

BTW, a depressingly large fraction of the existing calls to WalSndWakeup
are also inside critical sections, generally for no good reason that I
can see.  For example, in EndPrepare(), why was the call placed where
it is and not down beside SyncRepWaitForLSN?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Andres Freund
On Friday, May 11, 2012 08:36:24 PM Tom Lane wrote:
> Robert Haas  writes:
> > That definitely doesn't seem ideal - a lot of things can pile up
> > behind WALWriteLock.  I'm not sure how big a problem it would be in
> > practice, but we generally make a practice of avoiding sending signals
> > while holding LWLocks whenever possible...
> 
> There's a good reason for that, which is that the scheduler might well
> decide to go run the wakened process instead of you.  Admittedly this
> tends to not be a problem on machines with $bignum CPUs, but on
> single-CPU machines I've seen it happen a lot.
> 
> Refactoring so that the signal is sent only after lock release seems
> like a good idea to me.
Will send a patch lateron, duplication seems to be manageable.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Tom Lane
Robert Haas  writes:
> That definitely doesn't seem ideal - a lot of things can pile up
> behind WALWriteLock.  I'm not sure how big a problem it would be in
> practice, but we generally make a practice of avoiding sending signals
> while holding LWLocks whenever possible...

There's a good reason for that, which is that the scheduler might well
decide to go run the wakened process instead of you.  Admittedly this
tends to not be a problem on machines with $bignum CPUs, but on
single-CPU machines I've seen it happen a lot.

Refactoring so that the signal is sent only after lock release seems
like a good idea to me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Agenda For 3rd Cluster Hackers Summit, May 15th in Ottawa

2012-05-11 Thread Josh Berkus
Hackers, Replicators, Clusterers:

The agenda and the list of attendees for the cluster-hackers summit has
been set:

http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit

Please send me any corrections, changes or additions ASAP.

Thanks you!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-11 Thread Tom Lane
Robert Haas  writes:
> ... I don't really see any particular merit in removing our own
> XID from the picture entirely: that changes the behavior more
> significantly for no particular benefit.

The merit would be in keeping the function's definition simple.

Anyway, let's see if breaking this down by cases clarifies anything.
As I see it, there are three possible cases:

1. On master, xact already has an XID.  The longstanding behavior is
to use that XID as reference.  The committed patch changes this to
reference whatever is next-to-assign XID at first call of age(), but
it's far from clear to me that that's better for this case in isolation.

2. On master, xact does not (yet) have an XID.  The previous behavior
is to force XID assignment at first call of age().  However, if we
capture ReadNewTransactionId as per patch then we give the same answer
as we would have done before, only without assigning the xact an XID.
It could be argued that this can yield inconsistent results if the xact
later does something that forces XID assignment anyway, but surely
that's a pretty narrow corner case.

3. On slave, so xact cannot have an XID.  Previous behavior is to fail
which we all agree is unhelpful.  Capturing ReadNewTransactionId
provides behavior somewhat similar to patched case #2, though it's
unclear to me exactly how compatible it is given the likely skew between
master and slave notions of the next XID.

It's arguable that what we should do is "use XID if on master, capture
ReadNewTransactionId if on slave", which would avoid any backwards
incompatibility for the first two cases while still fixing the case that
everybody agrees is a problem.  Simon argues that this gives a weird
variance in the master vs slave behavior, but I'm not sure I believe
that's an issue.  In case 2, the only way that the user can tell the
difference between force-XID-assignment and capture-ReadNewTransactionId
is if the transaction later does something requiring an XID, which
cannot happen anyway on the slave.  So from here the difference in these
behaviors seems minimal and not worth creating incompatibility in the
first two cases for.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Peter Eisentraut
On fre, 2012-05-11 at 11:32 +0200, Magnus Hagander wrote:
> > You are misinterpreting this.  The reason Bruce's link was removed was
> > that the other (official) build was set to run at the same frequency, so
> > Bruce's build was exactly redundant.  The requirement/aspiration to have
> > a few minutes turnaround time continued.
> 
> But the other (official) build was *not* set to run at the same
> frequency. It was set, according to that mail, to run frequently
> enough, but it did not run every 5 minutes. at least not the only
> cronjob I found back then.

I would love to see what cron job job you are referring to.
Unfortunately, I don't have a backup, but I'm pretty sure at one point
it ran every three minutes or so.

> But in the interest of actually being productive - what *is* the
> usecase for needing a 5 minute turnaround time?

I don't exactly know, it was done at the request of users.  A lot of
people wanted to see the documentation of new checkins, just to learn
about how the new features worked.

As a general point, any delay time is going to raise questions, because
it usually won't be easy to find out when things will happen.  So the
"human" maintenance effort will be lower if it runs as soon as possible.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Jeff Janes
On Thu, May 10, 2012 at 10:44 AM, Bruce Momjian  wrote:
> On Thu, May 10, 2012 at 01:11:54PM +0100, Peter Geoghegan wrote:
>
>> Why can't we call group commit group commit (and for that matter,
>> index-only scans index-only scans), so that people will understand
>> that we are now competitive with other RDBMSs in this area? "Improve
>> performance of WAL writes when multiple transactions commit at the
>> same time" seems like a pretty bad description, since it doesn't make
>> any reference to batching of commits.  Also, I don't think that the
>
> I didn't call it "group commit" because we have settings we used to
> regard as group commit:

My understanding of that patch is that is does not cause "group
commit" to happen, but rather when a group commit does happen
"naturally" it causes all members of the group to awaken more
quickly/efficiently than they previously would have.

>
>        #commit_delay = 0           # range 0-10, in microseconds
>        #commit_siblings = 5            # range 1-1000
>
> These are still there.  Should they be removed?

The new locking around releasing group commit waiters has, if
anything, made these two more effective than before.  But that isn't
really saying much.  It seems like these knobs are (and were)
primarily useful for doing "stupid benchmark tricks" of little
practical value.  If there is an argument for removing them, I think
it would revolve around either "They never really should have been
there anyway", or "These days when people need far more commits per
second than they have revolutions per second, they buy BBU or NVRAM".

> I updated the release docs to call the item "group commit" because I now
> don't see any reference to that term in our docs.

I don't think I'd mention WAL writing at all, and just say that it
improves the concurrency of locking around group commits.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Andres Freund
On Friday, May 11, 2012 07:20:26 PM Robert Haas wrote:
> On Fri, May 11, 2012 at 1:09 PM, Fujii Masao  wrote:
> > Calling WalSndWakeup() while WALWriteLock is being held might cause
> > another performance degradation. No?
> 
> That definitely doesn't seem ideal - a lot of things can pile up
> behind WALWriteLock.  I'm not sure how big a problem it would be in
> practice, but we generally make a practice of avoiding sending signals
> while holding LWLocks whenever possible...
In my measurements on moderately powerful hardware I couldn't see any 
degradation on the primary - in fact the contrary, but the improvements were 
around 0.4% and I only tested 10min so its not exactly hard evidence. But I 
aggree its not ideal.
Its the only place though which knows whether its actually sensible to wakeup 
the walsender. We could make it return whether it wrote anything and do the 
wakeup at the callers. I count 4 different callsites which would be an 
annoying duplication but I don't really see anything better right now.

Better Ideas?

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Greg Stark
On Fri, May 11, 2012 at 6:16 PM, Kevin Grittner
 wrote:
>> MaxHeapTuplesPerPage?
>
> What about dead line pointers without corresponding tuples?

Actually we don't allow there to be more than MaxHeapTuplesPerPage
line pointers even if some of them are dead line pointers.

I think the argument then was that there could be bugs in code that
isn't expecting more so perhaps that doesn't justify baking that into
more places when we could instead be removing that assumption.

Also, if I absorbed enough of this conversation skimming backwards it
seems the algorithm would be very inefficient with such a conservative
estimate. On a table with normal width rows it would mean usually
picking tuples that don't exist and having to try again. As Tom
pointed out that would mean even a small sample would have to read
nearly the whole table to find the sample.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-11 Thread Robert Haas
On Fri, May 11, 2012 at 1:28 PM, Tom Lane  wrote:
> I'm not convinced this is the best thing, and I'm definitely not happy
> with changing the behavior of working cases (ie, behavior on the master)
> in the back branches.
>
> Anybody else have an opinion on this?

I agree with you.  Essentially, if we want age() to be stable, and
nobody's argued against that, we have to fix a point in the XID space
and do all of our computations relative to that point.  The original
coding did that be using our XID, and I think what we ought to do is
use either (1) our XID or (2) the next XID as of the first point in
time at which age() is called, if we don't have an XID yet then.
That's a slight behavior change even when not in HS mode, because a
read-only transaction won't acquire an XID just by virtue of using
age(), so I would probably have chosen to fix this only in master and
to not back-patch anything, but in practice I think the downside of
that behavior change is very minimal (and it might even be an equally
minimal improvement for some people) so I think a back-patch is fine.
However, I don't really see any particular merit in removing our own
XID from the picture entirely: that changes the behavior more
significantly for no particular benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-11 Thread Tom Lane
Simon Riggs  writes:
> On 11 May 2012 17:13, Tom Lane  wrote:
>> Hm.  This fixes the stability-within-transaction problem, but it's also
>> introduced a change in the definition of age(), no?

> Yeh, I thought about that.

> The likely difference between the old and the new result is likely to
> be small, especially in the main intended use case. The previous
> definition was fairly weird, since if you executed it in a long
> running transaction it would give a false reading of the actual age,
> which ISTM was a bug in itself.

Well, this definition could also give unexpected readings, depending on
when the first call to age() occurred within the long-running xact.
I'm not sure there's any fix for that unless you want to redefine age()
as a volatile function that uses ReadNewTransactionId on *each* call;
and that doesn't sound like a more useful definition to me.

> What would be more confusing would be to have age() return a different
> result on standby and master.

But surely you can get that anyway, since ReadNewTransactionId is
unlikely to give the same results on standby and master.

> At present the back branches just throw ERROR, so some change is
> needed there at least, given our earlier policy of keeping that ERROR
> as a backstop rather than as an acceptable return (re: SQLcode
> discussions).

> I've no objection to further change, but I think I've done the best
> thing out of the various options.

I'm not convinced this is the best thing, and I'm definitely not happy
with changing the behavior of working cases (ie, behavior on the master)
in the back branches.

Anybody else have an opinion on this?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Robert Haas
On Fri, May 11, 2012 at 1:09 PM, Fujii Masao  wrote:
> Calling WalSndWakeup() while WALWriteLock is being held might cause another
> performance degradation. No?

That definitely doesn't seem ideal - a lot of things can pile up
behind WALWriteLock.  I'm not sure how big a problem it would be in
practice, but we generally make a practice of avoiding sending signals
while holding LWLocks whenever possible...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-11 Thread Tom Lane
"David E. Wheeler"  writes:
> On May 11, 2012, at 9:51 AM, David E. Wheeler wrote:
>> \set ON_ERROR_ROLLBACK interactive
>> 
>> So I guess it transparently rolls back a savepoint for the previous 
>> statement. I had forgotten I turned that on. Apologies for the noise.

Ah-hah.

> OTOH, might it be useful to have psql show some sort of status message when 
> it does this?

The same occurred to me, but after more thought I think it'd just be
horribly annoying after a short time.  People who have this set will
(or should, anyway) be expecting errors to be nonfatal to their
transactions.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Kevin Grittner
Robert Haas  wrote:
> On Fri, May 11, 2012 at 11:36 AM, Kevin Grittner
>  wrote:
>> Robert Haas  wrote:
>>> The trouble is, AFAICS, that you can't bound M very well without
>>> scanning the whole table.  I mean, it's bounded by theoretical
>>> limit, but that's it.
>>
>> What would the theoretical limit be?
> 
> MaxHeapTuplesPerPage?
 
What about dead line pointers without corresponding tuples?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-11 Thread David E. Wheeler
On May 11, 2012, at 9:51 AM, David E. Wheeler wrote:

>\set ON_ERROR_ROLLBACK interactive
> 
> So I guess it transparently rolls back a savepoint for the previous 
> statement. I had forgotten I turned that on. Apologies for the noise.

OTOH, might it be useful to have psql show some sort of status message when it 
does this?

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Fujii Masao
On Fri, May 11, 2012 at 4:51 AM, Andres Freund  wrote:
> diff --git a/src/backend/access/transam/xlog.c
> b/src/backend/access/transam/xlog.c
> index ecb71b6..7a3224b 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -1906,6 +1906,10 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool
> xlog_switch)
>            xlogctl->LogwrtRqst.Flush = LogwrtResult.Flush;
>        SpinLockRelease(&xlogctl->info_lck);
>    }
> +
> +   /* the walsender wasn't woken up in xact.c */
> +   if(max_wal_senders > 1 && synchronous_commit == SYNCHRONOUS_COMMIT_OFF)
> +       WalSndWakeup();
>  }

Calling WalSndWakeup() while WALWriteLock is being held might cause another
performance degradation. No?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unite recovery.conf and postgresql.conf

2012-05-11 Thread Josh Berkus

> A key requirement is to be able to drop in new config files without
> needing to $EDIT anything.

Yes, absolutely.  I want to move towards the idea that the majority of
our users never edit postgresql.conf by hand.

> OK, its possible to put in lots of includeifexists for non-existent
> files just in case you need one, but that sucks.

Yeah, seems like we need something more elegant.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Robert Haas
On Fri, May 11, 2012 at 11:36 AM, Kevin Grittner
 wrote:
> Robert Haas  wrote:
>> The trouble is, AFAICS, that you can't bound M very well without
>> scanning the whole table.  I mean, it's bounded by theoretical
>> limit, but that's it.
>
> What would the theoretical limit be?

MaxHeapTuplesPerPage?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-11 Thread David E. Wheeler
On May 11, 2012, at 9:39 AM, Tom Lane wrote:

>> Hrm∑I've also just replicated it on CentOS 6.2 with Perl 5.10.1:
> 
> Interesting.

Ah, it’s a psql configuration issue. I had replicated it on that box by 
connecting with psql on my Mac. When I SSHed to the box and used the psql 
there, I was *not* able to replicate it. 

So then I tried psql from my Mac again, this time with -X, and the transaction 
was properly cancelled. Some more fiddling, and I find that this is the culprit:

\set ON_ERROR_ROLLBACK interactive

So I guess it transparently rolls back a savepoint for the previous statement. 
I had forgotten I turned that on. Apologies for the noise.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-11 Thread Simon Riggs
On 11 May 2012 17:13, Tom Lane  wrote:
> Simon Riggs  writes:
>> Ensure age() returns a stable value rather than the latest value
>
> Hm.  This fixes the stability-within-transaction problem, but it's also
> introduced a change in the definition of age(), no?  Previously, in an
> xact that had an XID, the age was measured relative to that XID.
> I'm not sure that we should lightly abandon that correspondence.
> At the very least we would need to update the user-facing documentation,
> not only the function's header comment.  So far as I can find, the only
> such documentation is the pg_description entry:
> DESCR("age of a transaction ID, in transactions before current transaction");
> but that's still wrong now.
>
> The definition I was thinking of was "if xact has an XID use that, else
> do ReadNewTransactionId, and in either case save the value for later
> calls during the current virtual xact."  This is more complicated than
> what you implemented, and maybe we shouldn't be quite that tense about
> backwards-compatibility.  But I don't think we should be changing the
> function's definition like you've done in back branches.

Yeh, I thought about that.

The likely difference between the old and the new result is likely to
be small, especially in the main intended use case. The previous
definition was fairly weird, since if you executed it in a long
running transaction it would give a false reading of the actual age,
which ISTM was a bug in itself.

What would be more confusing would be to have age() return a different
result on standby and master.

At present the back branches just throw ERROR, so some change is
needed there at least, given our earlier policy of keeping that ERROR
as a backstop rather than as an acceptable return (re: SQLcode
discussions).

I've no objection to further change, but I think I've done the best
thing out of the various options.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-11 Thread Tom Lane
"David E. Wheeler"  writes:
> Hrm…I've also just replicated it on CentOS 6.2 with Perl 5.10.1:

Interesting.

> Perhaps there is something funky in my configuration, though I tried a
> few different things and couldn't get it to change.

Yeah.  If Bruce and I don't see it on a couple of configurations apiece,
and you do see it on two different machines, personal configuration
choices start to sound like a likely contributing factor.  Could be
either PG configuration or Perl configuration (if there is any such
thing beyond installation-time decisions).

As for me, I was testing assert-enabled builds that were otherwise
entirely out-of-the-box, and I don't even know how to configure Perl.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-11 Thread David E. Wheeler
On May 10, 2012, at 6:18 PM, Tom Lane wrote:

> I also tried this on a Fedora 16 box, which has
> 
> $ perl -v
> This is perl 5, version 14, subversion 2 (v5.14.2) built for 
> x86_64-linux-thread-multi
> 
> Works fine there too...

Hrm…I've also just replicated it on CentOS 6.2 with Perl 5.10.1:

reputation=# begin;
BEGIN
Time: 22.330 ms
reputation=# do language plperlu $$ elog(ERROR, 'foo')$$;
ERROR:  foo at line 1.
CONTEXT:  PL/Perl anonymous code block
reputation=# select true;
 bool 
--
 t
(1 row)

Time: 18.392 ms
reputation=# rollback;
ROLLBACK
Time: 19.854 ms
reputation=# do language plperlu $$elog(NOTICE, $^V)$$;
NOTICE:  v5.10.1
CONTEXT:  PL/Perl anonymous code block
DO
Time: 20.459 ms
reputation=# select version();
   version  
 
--
 PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 
20110731 (Red Hat 4.4.6-3), 64-bit
(1 row)

Perhaps there is something funky in my configuration, though I tried a few 
different things and couldn't get it to change. I can show you on my laptop 
next week if you want to see it in person. Not sure if that will help…

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value

2012-05-11 Thread Tom Lane
Simon Riggs  writes:
> Ensure age() returns a stable value rather than the latest value

Hm.  This fixes the stability-within-transaction problem, but it's also
introduced a change in the definition of age(), no?  Previously, in an
xact that had an XID, the age was measured relative to that XID.
I'm not sure that we should lightly abandon that correspondence.
At the very least we would need to update the user-facing documentation,
not only the function's header comment.  So far as I can find, the only
such documentation is the pg_description entry:
DESCR("age of a transaction ID, in transactions before current transaction");
but that's still wrong now.

The definition I was thinking of was "if xact has an XID use that, else
do ReadNewTransactionId, and in either case save the value for later
calls during the current virtual xact."  This is more complicated than
what you implemented, and maybe we shouldn't be quite that tense about
backwards-compatibility.  But I don't think we should be changing the
function's definition like you've done in back branches.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog

2012-05-11 Thread Fujii Masao
On Sat, May 12, 2012 at 12:44 AM, Tom Lane  wrote:
> Magnus Hagander  writes:
>> How common *is* it to have a build that doesn't have integer timestamps
>> these days? Does any of the binary builds do that at all, for example? If
>> it's uncommon enough, I think we should just go with the easy way out...
>
> +1 for just rejecting a mismatch.

Agreed.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Florian Pflug
On May11, 2012, at 16:03 , Kevin Grittner wrote:
>> [more complex alternatives]
> 
> I really think your first suggestion covers it perfectly; these more
> complex techniques don't seem necessary to me.

The point of the more complex techniques (especially the algorithm in
my second mail, the "reply to self") was simply to optimize the generation
of a random, uniformly distributed, unique and sorted list of TIDs.

The basic idea is to make sure we generate the TIDs in physical order,
and thus automatically ensure that they are unique. The reduces the memory
(or disk) requirement to O(1) instead of O(n), and (more importantly,
actually) makes the actual implementation much simpler.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can pg_trgm handle non-alphanumeric characters?

2012-05-11 Thread Fujii Masao
On Fri, May 11, 2012 at 4:11 AM, Tom Lane  wrote:
> Fujii Masao  writes:
>> On Fri, May 11, 2012 at 12:07 AM, MauMau  wrote:
>>> Thanks for your explanation. Although I haven't understood it well yet, I'll
>>> consider what you taught. And I'll consider if the tentative measure of
>>> removing KEEPONLYALNUM is correct for someone who wants to use pg_trgm
>>> against Japanese text.
>
>> In Japanese, it's common to do a text search with two characters keyword.
>> But since pg_trgm is 3-gram, you basically would not be able to use index
>> for such text search. So you might need something like pg_bigm or pg_unigm
>> for Japanese text search.

Even if an index can be used for two characters text search, bitmap index scan
picks up all rows, so it's too slow.

> I believe the trigrams are three *bytes* not three characters.  So a
> couple of kanji should work just fine for this.

Really? As far as I read the code of pg_trgm, the trigram is three characters
and its CRC32 is used as an index key if its size is more than three bytes.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Kevin Grittner
Tom Lane  wrote:
> "Kevin Grittner"  writes:
>> [ uniformly sample the TID space defined as (1..P, 1..M) ]
> 
>> Shouldn't that get us the randomly chosen sample we're looking
>> for?  Is there a problem you think this ignores?
> 
> Not sure.  The issue that I'm wondering about is that the line
> number part of the space is not uniformly populated, ie, small
> line numbers are much more likely to exist than large ones.  (In
> the limit that density goes to zero, when you pick M much too
> large.)  It's not clear to me whether this gives an unbiased
> probability of picking real tuples, as opposed to hypothetical
> TIDs.
 
I'm convinced it generates a random sample, since every tuple in the
*possible* space has an equal chance of selection, and we ignore the
ones which don't exist or aren't visible, each of the remaining
(existing, visible) tuples is as likely to be chosen as any other. 
I'm that's not a formal proof, but I'm sure I could work one up.
 
> Another issue is efficiency.  In practical cases you'll have to
> greatly overestimate M compared to the typical actual-number-of-
> tuples-per-page, which will lead to a number of target TIDs N
> that's much larger than necessary, which will make the scan slow
> --- I think in practice you'll end up doing a seqscan or something
> that might as well be one, because unless S is *really* tiny it'll
> hit just about every page.  We can have that today without months
> worth of development effort, using the "WHERE random() < S"
> technique.
 
I think you've probably got me there.  The technique you describe
should be equally random, and  thinking through the work involved in
each, the random() < S approach seems almost certain to be cheaper. 
Is it worth supporting the TABLESAMPLE BERNOULLI option as syntactic
sugar for that?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Florian Pflug
On May11, 2012, at 17:20 , Robert Haas wrote:
> On Fri, May 11, 2012 at 11:04 AM, Kevin Grittner
>> Since T cancels out of that equation, we don't need to worry about
>> estimating it.  Results will be correct for any value of M which is
>> *at least* as large as the maximum tuple index in the table,
>> although values of M larger than that will degrade performance.  The
>> same holds true for the number of pages.
> 
> The trouble is, AFAICS, that you can't bound M very well without
> scanning the whole table.  I mean, it's bounded by theoretical limit,
> but that's it.

Hm, but maybe Kevin's observation that the actual value of M shouldn't
matter as long as it's large enough helps here. What if you start out
with M=1 and generate your first TID. After reading the page, but before
returning a tuple, you check if M is indeed an upper bound on the tuple
indices. If it isn't, you increase M, recompute N (the number of probes),
determine a new random tuple index, and return the tuple (if it is live).

Would that introduce bias? I'd think not, because scaling up N shouldn't,
per Kevin's argument, change the probability of a TID being picked. So
increasing N in the middle of a scan should neither penalize nor favour
tuples which were already returned compared to those which will follow,
no?

(And yes, even if there don't turn out to be any obvious holes in this
argument, it requires more formal proof that I was able to give here
before being turned into code. Or at the very least excessive testing
which all kinds of data)

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Andrew Dunstan



On 05/11/2012 05:32 AM, Magnus Hagander wrote:


But in the interest of actually being productive - what *is* the
usecase for needing a 5 minute turnaround time? I don't buy the "check
what a patch looks like", because that should be done *before* the
commit, not after - so it's best verified by a local docs build anyway
(which will also be faster).

I'm sure we can put something in with a pretty quick turnaround again
without too much strain on the system, but it does, as I mentioned
before, require decoupling it from the buildfarm which means it's not
just tweaking a config file.


If it's of any use to you I have made some adjustments to the buildfarm 
code which would let you do *just* the docs build (and dist make if you 
want). It would still pull from git, and only do anything if there's a 
(relevant) change. So using that to set up a machine that would run 
every few minutes might work. Of course, building the docs can itself be 
fairly compute intensive, so you still might not want to run every few 
minutes if that's a limiting factor.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog

2012-05-11 Thread Tom Lane
Magnus Hagander  writes:
> How common *is* it to have a build that doesn't have integer timestamps
> these days? Does any of the binary builds do that at all, for example? If
> it's uncommon enough, I think we should just go with the easy way out...

+1 for just rejecting a mismatch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Tom Lane
"Kevin Grittner"  writes:
> Robert Haas  wrote:
>> The trouble is, AFAICS, that you can't bound M very well without
>> scanning the whole table.  I mean, it's bounded by theoretical
>> limit, but that's it.
 
> What would the theoretical limit be?  (black size - page header size
> - minimum size of one tuple) / item pointer size?  So, on an 8KB
> page, somewhere in the neighborhood of 1350?

Your math is off --- I get something less than 300, even if the tuples
are assumed to be empty of data.  (Header size 24 bytes, plus 4-byte
line pointer, so at least 28 bytes per tuple, so at most 292 on an 8K
page.)  But you still end up probing just about every page for a 1%
sample.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Kevin Grittner
Robert Haas  wrote:
 
> The trouble is, AFAICS, that you can't bound M very well without
> scanning the whole table.  I mean, it's bounded by theoretical
> limit, but that's it.
 
What would the theoretical limit be?  (black size - page header size
- minimum size of one tuple) / item pointer size?  So, on an 8KB
page, somewhere in the neighborhood of 1350?  Hmm.  If that's right,
that would mean a 1% random sample would need 13.5 probes per page,
meaning there wouldn't tend to be a lot of pages missed.  Still, the
technique for getting a random sample seems sound, unless someone
suggests something better.  Maybe we just want to go straight to a
seqscan to get to the pages we want to probe rather than reading
just the ones on the "probe list" in physical order?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Tom Lane
"Kevin Grittner"  writes:
> [ uniformly sample the TID space defined as (1..P, 1..M) ]

> Shouldn't that get us the randomly chosen sample we're looking for? 
> Is there a problem you think this ignores?

Not sure.  The issue that I'm wondering about is that the line number
part of the space is not uniformly populated, ie, small line numbers
are much more likely to exist than large ones.  (In the limit that
density goes to zero, when you pick M much too large.)  It's not clear
to me whether this gives an unbiased probability of picking real tuples,
as opposed to hypothetical TIDs.

Another issue is efficiency.  In practical cases you'll have to greatly
overestimate M compared to the typical actual-number-of-tuples-per-page,
which will lead to a number of target TIDs N that's much larger than
necessary, which will make the scan slow --- I think in practice you'll
end up doing a seqscan or something that might as well be one, because
unless S is *really* tiny it'll hit just about every page.  We can have
that today without months worth of development effort, using the "WHERE
random() < S" technique.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Robert Haas
On Fri, May 11, 2012 at 11:04 AM, Kevin Grittner
 wrote:
> We
> will be probing random page numbers 1..P for random tuple indexes
> 1..M.  So how many random probes by ctid does that require? The
> chance of a hit on each probe is ((T/P)/M) -- the average number of
> tuples per page divided by the number of tuple index values allowed.
> So we need (S*T)/((T/P)/M) probes.  Simplifying that winds up being
> S*M*P the product of the sample size as a percentage, the maximum
> tuple index on a page, and the number of pages.  (A calculation some
> may have jumped to as intuitively obvious.)
>
> So let's call the number of probes N.  We randomly generate N
> distinct ctid values, where each is a random page number 1..P and a
> random index 1..M.  We attempt to read each of these in block number
> order, not following HOT chains.  For each, if the tuple exists and
> is visible, it is part of our result set.
>
> Since T cancels out of that equation, we don't need to worry about
> estimating it.  Results will be correct for any value of M which is
> *at least* as large as the maximum tuple index in the table,
> although values of M larger than that will degrade performance.  The
> same holds true for the number of pages.

The trouble is, AFAICS, that you can't bound M very well without
scanning the whole table.  I mean, it's bounded by theoretical limit,
but that's it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Kevin Grittner
Tom Lane  wrote:
 
> If you're willing to accept that the quality of the results
> depends on having up-to-date stats, then I'd suggest (1) use the
> planner's existing technology to estimate the number of rows in
> the table; (2) multiply by sampling factor you want to get a
> desired number of sample rows; (3) use ANALYZE's existing
> technology to acquire that many sample rows.  While the ANALYZE
> code isn't perfect with respect to the problem of nonuniform TID
> density, it certainly will be a lot better than pretending that
> that problem doesn't exist.
 
That is basically what we've been talking about for SYSTEM samples,
which I think we've agreed should be finished and committed before
programming on the BERNOULLI option.  Nobody is pretending that the
problem doesn't exist.  Let me restate the option I'm thinking
solves the problem well, since it would be easy to have missed what
I was responding to with all the various suggestions on the thread.
 
The user has asked for a BERNOULLI sample S of some percent of the
table.  We determine the maximum tuple index M which could be used
for any ctid in the table.  We get an estimate of the number of
pages P in the table using whatever is the best available technique.
If you assume that the number of tuples (visible or not) is T, the
approximate number of tuples we want to randomly select is S*T.  We
will be probing random page numbers 1..P for random tuple indexes
1..M.  So how many random probes by ctid does that require? The
chance of a hit on each probe is ((T/P)/M) -- the average number of
tuples per page divided by the number of tuple index values allowed.
So we need (S*T)/((T/P)/M) probes.  Simplifying that winds up being
S*M*P the product of the sample size as a percentage, the maximum
tuple index on a page, and the number of pages.  (A calculation some
may have jumped to as intuitively obvious.)
 
So let's call the number of probes N.  We randomly generate N
distinct ctid values, where each is a random page number 1..P and a
random index 1..M.  We attempt to read each of these in block number
order, not following HOT chains.  For each, if the tuple exists and
is visible, it is part of our result set.
 
Since T cancels out of that equation, we don't need to worry about
estimating it.  Results will be correct for any value of M which is
*at least* as large as the maximum tuple index in the table,
although values of M larger than that will degrade performance.  The
same holds true for the number of pages.
 
Shouldn't that get us the randomly chosen sample we're looking for? 
Is there a problem you think this ignores?
 
Credit where credit is due: I *think* this is merely my restatement
of something Florian was suggesting, building on a suggestion from
Robert.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog

2012-05-11 Thread Magnus Hagander
On Thursday, May 10, 2012, Fujii Masao wrote:

> On Thu, May 10, 2012 at 11:51 PM, Magnus Hagander 
> >
> wrote:
> > And taking this a step further - we *already* send these GUCs.
> > Previous references to us not doing that were incorrect :-)
> >
> > So this should be a much easier fix than we thought. And can be done
> > entirely in pg_basebackup, meaning we don't need to worry about
> > beta...
>
> Sounds good!
>
>
Should we go down the easy way and just reject connections when the flag is
mismatching between the client and the server (trivial to do - see the
attached patch)? Or should we try to implement both floating point and
integer in pg_basebackup, and make it work in either case? (We'd still have
to reject it if pg_basebackup was compiled without support for int64 at
all, of course, but the large majority of cases will have integer
timestamps these days, but could be made to support both integer and float
for the trivial operations that pg_basebackup actually does).

How common *is* it to have a build that doesn't have integer timestamps
these days? Does any of the binary builds do that at all, for example? If
it's uncommon enough, I think we should just go with the easy way out...

//Magnus



-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index b12932f..3a3217d 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -75,6 +75,7 @@ GetConnection(void)
 	const char **keywords;
 	const char **values;
 	char	   *password = NULL;
+	char	   *tmpparam;
 
 	if (dbhost)
 		argcount++;
@@ -157,6 +158,29 @@ GetConnection(void)
 		free(values);
 		free(keywords);
 
+		/*
+		 * Ensure we have the same value of integer timestamps as the
+		 * server we are connecting to.
+		 */
+		tmpparam = PQparameterStatus(tmpconn, "integer_datetimes");
+		if (!tmpparam)
+		{
+			fprintf(stderr, _("%s: could not determine server setting for integer_datetimes\n"),
+	progname);
+			exit(1);
+		}
+
+#ifdef HAVE_INT64_TIMESTAMP
+		if (strcmp(tmpparam, "on") != 0)
+#else
+		if (strcmp(tmpparam, "off") != 0)
+#endif
+		{
+			fprintf(stderr, _("%s: integer_datetimes compile flag mismatch with server\n"),
+	progname);
+			exit(1);
+		}
+
 		/* Store the password for next run */
 		if (password)
 			dbpassword = password;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Tom Lane
Bruce Momjian  writes:
> Let me add that I am concerned about the lack of objectivity in many of
> the suggestions in this thread.  This has prompted me to think that the
> temptation of having names on these release note items is just too
> great, and that the names should be removed.

Er, what?  I have not seen anything in this thread that merits such
an accusation.

> Let me put it this way --- the release notes are read by thousands of
> people.  The benefit individuals gather from their names in the release
> notes is a small part of the overall value provided by the release notes
> to users.  There was a practical need to have names on items in the past
> --- that need is no longer present.

My recollection is that we have been putting the names on the items to
(a) give credit where credit is due, and (b) show that Postgres has a
large and growing development community.  I had never heard the argument
"remember whom to blame for a broken feature" until you raised it
yesterday --- personally, I've always looked in the commit logs if I want
to know something like that.  So I don't see that there's really any
change in the terms of discussion.  It may be that the release notes
aren't the best place for doing either (a) or (b), but I don't agree
that we simply don't need to worry about either anymore.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Andrew Dunstan



On 05/11/2012 10:15 AM, Bruce Momjian wrote:

On Fri, May 11, 2012 at 10:01:32AM -0400, Bruce Momjian wrote:

On Fri, May 11, 2012 at 09:51:49AM -0400, Andrew Dunstan wrote:


On 05/11/2012 08:56 AM, Bruce Momjian wrote:

On Thu, May 10, 2012 at 08:46:56PM -0700, Robert Haas wrote:

On May 10, 2012, at 4:19 PM, Andrew Dunstan   wrote:

On 05/10/2012 06:15 PM, Tom Lane wrote:

How about a hybrid: we continue to identify patch authors as now, that is with names 
attached to the feature/bugfix descriptions, and then have a separate section "Other 
Contributors" to recognize patch reviewers and other helpers?

works for me.

Me, too.

That does not work for me.  There is no practical reason for a list of
names to appear in the release notes.  I suggest if we want to do that
that we remove all names from the release notes (as Tom suggested), and
create a wiki for credit, and link to that from the release
announcement.  That would allow us to put company names in there too.


I gave you a reason. You might not agree with it but saying that
it's no reason doesn't make it so. A wiki page will just be
duplication, IMNSHO.

I mean a reason from the reader/development-process perspective, not
from the perspective of giving a some benefit to contributors.

Let me add that I am concerned about the lack of objectivity in many of
the suggestions in this thread.  This has prompted me to think that the
temptation of having names on these release note items is just too
great, and that the names should be removed.

Let me put it this way --- the release notes are read by thousands of
people.  The benefit individuals gather from their names in the release
notes is a small part of the overall value provided by the release notes
to users.  There was a practical need to have names on items in the past
--- that need is no longer present.

I predict that if we twist the release notes to have PR value for
contributors, it will become a prepetual problem and will diminish the
cohesiveness of our group.  I am already personally upset by a few of
the things I have seen on this thread.



Well, I don't know what has changed that made it imperative in the past 
to have the names and makes it now redundant, nor what could possibly 
have upset you so much. Maybe I'm dense, but that's the truth of it.


Now if someone is going to volunteer to build *AND* *MAINTAIN* a Credits 
page, that will be good. It would be even better if they would go back 
and do it historically. But just hoping that will happen and meantime 
removing the names from the notes seems to me a retrograde step.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Tom Lane
"Kevin Grittner"  writes:
> Florian Pflug  wrote:
>> Maybe one can get rid of these sorts of problems by factoring in
>> the expected density of the table beforehand and simply accepting
>> that the results will be inaccurate if the statistics are
>> outdated?
 
> Unless I'm missing something, I think that works for percentage
> selection, which is what the standard talks about, without any need
> to iterate through addition samples.  Good idea!  We don't need to
> do any second pass to pare down initial results, either.  This
> greatly simplifies coding while providing exactly what the standard
> requires.
 
>> I'm not totally sure whether this approach is sensible to
>> non-uniformity in the tuple to line-pointer assignment, though.

If you're willing to accept that the quality of the results depends on
having up-to-date stats, then I'd suggest (1) use the planner's existing
technology to estimate the number of rows in the table; (2) multiply
by sampling factor you want to get a desired number of sample rows;
(3) use ANALYZE's existing technology to acquire that many sample rows.
While the ANALYZE code isn't perfect with respect to the problem of
nonuniform TID density, it certainly will be a lot better than
pretending that that problem doesn't exist.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Michael Nolan
On 5/11/12, Michael Nolan  wrote:
> On 5/11/12, Albe Laurenz  wrote:
>> Michael Nolan wrote:
>>> I see one potential difference between your results and mine.
>>>
>>> When I rebuild the tablespace, I wind up with the same filename/OID as
>>> before, I'm not sure you do.
>>
>> Right. That's strange.
>> Usually OIDs get incremented, so you shouldn't end up with the same
>> OID for the new tablespace.
>>
>> Can you provide a complete testcase?
>
> I thought I had, until you were unable to reproduce it. :-)
> --
> Mike Nolan
>

My plan at this point is to wait until beta 1 of 9.2 is out, then see
if I can reproduce
the problem there.
--
Mike Nolan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Michael Nolan
On 5/11/12, Albe Laurenz  wrote:
> Michael Nolan wrote:
>> I see one potential difference between your results and mine.
>>
>> When I rebuild the tablespace, I wind up with the same filename/OID as
>> before, I'm not sure you do.
>
> Right. That's strange.
> Usually OIDs get incremented, so you shouldn't end up with the same
> OID for the new tablespace.
>
> Can you provide a complete testcase?

I thought I had, until you were unable to reproduce it. :-)
--
Mike Nolan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Bruce Momjian
On Fri, May 11, 2012 at 10:01:32AM -0400, Bruce Momjian wrote:
> On Fri, May 11, 2012 at 09:51:49AM -0400, Andrew Dunstan wrote:
> > 
> > 
> > On 05/11/2012 08:56 AM, Bruce Momjian wrote:
> > >On Thu, May 10, 2012 at 08:46:56PM -0700, Robert Haas wrote:
> > >>On May 10, 2012, at 4:19 PM, Andrew Dunstan  wrote:
> > >>>On 05/10/2012 06:15 PM, Tom Lane wrote:
> > How about a hybrid: we continue to identify patch authors as now, that 
> > is with names attached to the feature/bugfix descriptions, and then 
> > have a separate section "Other Contributors" to recognize patch 
> > reviewers and other helpers?
> > >>>works for me.
> > >>Me, too.
> > >That does not work for me.  There is no practical reason for a list of
> > >names to appear in the release notes.  I suggest if we want to do that
> > >that we remove all names from the release notes (as Tom suggested), and
> > >create a wiki for credit, and link to that from the release
> > >announcement.  That would allow us to put company names in there too.
> > >
> > 
> > I gave you a reason. You might not agree with it but saying that
> > it's no reason doesn't make it so. A wiki page will just be
> > duplication, IMNSHO.
> 
> I mean a reason from the reader/development-process perspective, not
> from the perspective of giving a some benefit to contributors.

Let me add that I am concerned about the lack of objectivity in many of
the suggestions in this thread.  This has prompted me to think that the
temptation of having names on these release note items is just too
great, and that the names should be removed.

Let me put it this way --- the release notes are read by thousands of
people.  The benefit individuals gather from their names in the release
notes is a small part of the overall value provided by the release notes
to users.  There was a practical need to have names on items in the past
--- that need is no longer present.

I predict that if we twist the release notes to have PR value for
contributors, it will become a prepetual problem and will diminish the
cohesiveness of our group.  I am already personally upset by a few of
the things I have seen on this thread.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Tom Lane
Florian Pflug  writes:
> This all hinges on the ability to produce a sufficient accurate estimate of 
> the
> TID density p_tup/p_tid, of course.

I think that's the least of its problems.  AFAICS this analysis ignores
(1) the problem that the TID space is nonuniform, ie we don't know how
many tuples in each page until we look;
(2) the problem that we don't know the overall number of tuples
beforehand.

I'm not sure that there is any way to deal with (1) fully without
examining every single page, but algorithms that assume that the TIDs
are numbered linearly are broken before they start.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Kevin Grittner
Florian Pflug  wrote:
 
> Maybe one can get rid of these sorts of problems by factoring in
> the expected density of the table beforehand and simply accepting
> that the results will be inaccurate if the statistics are
> outdated?
> 
> One could, for example, simply pick
> 
>   N := SamplingPercentage * MaxTuplesPerPage /
>AvgLiveTuplesPerPage
> 
> where
> 
>  AvgLiveTuplesPerPage := #Tuples / #Pages
> 
> random TIDs, fetch the live ones, and return them.
 
To clarify, I read this as using reltuples and relpages for the
table, and returning only tuples which are visible according to the
query's snapshot.  (i.e., I think you used "live" to mean two
different things there.)
 
Unless I'm missing something, I think that works for percentage
selection, which is what the standard talks about, without any need
to iterate through addition samples.  Good idea!  We don't need to
do any second pass to pare down initial results, either.  This
greatly simplifies coding while providing exactly what the standard
requires.
 
> I'm not totally sure whether this approach is sensible to
> non-uniformity in the tuple to line-pointer assignment, though.
 
I think we can solve that by going high enough with tuple numbers to
reach the highest tuple ID that might be in use in the table, and
*not* following HOT chains.  (If we follow HOT chains, we could have
several distinct ctid values which returned the same tuple.)  Or am
I thinking about this incorrectly?
 
> [more complex alternatives]
 
I really think your first suggestion covers it perfectly; these more
complex techniques don't seem necessary to me.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Bruce Momjian
On Fri, May 11, 2012 at 09:51:49AM -0400, Andrew Dunstan wrote:
> 
> 
> On 05/11/2012 08:56 AM, Bruce Momjian wrote:
> >On Thu, May 10, 2012 at 08:46:56PM -0700, Robert Haas wrote:
> >>On May 10, 2012, at 4:19 PM, Andrew Dunstan  wrote:
> >>>On 05/10/2012 06:15 PM, Tom Lane wrote:
> How about a hybrid: we continue to identify patch authors as now, that is 
> with names attached to the feature/bugfix descriptions, and then have a 
> separate section "Other Contributors" to recognize patch reviewers and 
> other helpers?
> >>>works for me.
> >>Me, too.
> >That does not work for me.  There is no practical reason for a list of
> >names to appear in the release notes.  I suggest if we want to do that
> >that we remove all names from the release notes (as Tom suggested), and
> >create a wiki for credit, and link to that from the release
> >announcement.  That would allow us to put company names in there too.
> >
> 
> I gave you a reason. You might not agree with it but saying that
> it's no reason doesn't make it so. A wiki page will just be
> duplication, IMNSHO.

I mean a reason from the reader/development-process perspective, not
from the perspective of giving a some benefit to contributors.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Sandro Santilli
On Thu, May 10, 2012 at 02:30:35PM -0400, Robert Haas wrote:
> On Thu, May 10, 2012 at 2:07 PM, Kevin Grittner
>  wrote:
> > Ants Aasma  wrote:
> >> It seems to me that the simplest thing to do would be to lift the
> >> sampling done in analyze.c (acquire_sample_rows) and use that to
> >> implement the SYSTEM sampling method.
> >
> > Definitely.  I thought we had all agreed on that ages ago.
> 
> Right, and I don't think we should be considering any of this other
> stuff until that basic thing is implemented and working.

Agreed. That's what I'd love to see as well, for the GIS part.

--strk; 

  ,--o-. 
  |   __/  |Delivering high quality PostGIS 2.0 !
  |  / 2.0 |http://strk.keybit.net - http://vizzuality.com
  `-o--'


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Florian Pflug
[ Sorry for the self-reply ]

On May11, 2012, at 13:17 , Florian Pflug wrote:
> Actually, thinking more about this, if the approach sketched above
> turns out to work, then one might be able to get away without remembering
> previously computed TIDs, thus removing a lot of complexity.
> 
> Say, for example, you simply start out with a single random TID tid[0].
> The you produce the sequence of "random" TIDs by setting
> 
>  tid[n+1] = tid[n] + random(1 <= x <= 2*D-1)
> 
> where D is the expected distance between one TID from the sample set
> and the next higher one, i.e. D = 2 * #TIDs / N. (You'd simply stop once
> tid[n] >) #TIDs). This will give you approximately uniformly distributed
> TIDs, I think, and will even return them in physical order. The 100$ question
> is, by how much does this violate the independence requirement, i.e. how
> far are P(tuple X picked) and P(tuple X picked | tuple Y picked) apart?
> Some search through the literature should be able to answer that.

Actually, one can easily do better then that. Say you've determined that
to pick each tuple with probability p_tup, you need to pick each TID with
probability p_tid (where p_tup/p_tid is the TID density, i.e. the probability
of a single TID being live). Now, looping over all TIDs and picking each with
probability p_tid is, of course, just a another (more error-prone) way of
iterating over all tuples and picking each with probability p_tup. But instead
of looping, you can jump from from picked TID to the next. Observe that, after
having picked tid X, the probability of picking X+i as the next tid is
(1 - p_tid)^(i-1) * p_tid, because to pick X+i next you have to skip (i-1) TIDs
and then pick the i-th one. Thus, the following algorithm returns a sorted list
of TIDs which should be uniformly distributed and independent (or so I think,
at least)

  1) Starting with a random tid tid[0], chosen such that
   P(tid[0] = i) = (1 - p_tid)^i * p_tid

  2) Setting tid[n+1] = tid[n] + d, which d > 0 chosen such that
   P(d = i) = (1 - p_tid)^(i-1) * p_tid

  3) Abort if max(tid) is >= #TIDs.

This all hinges on the ability to produce a sufficient accurate estimate of the
TID density p_tup/p_tid, of course.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Andrew Dunstan



On 05/11/2012 08:56 AM, Bruce Momjian wrote:

On Thu, May 10, 2012 at 08:46:56PM -0700, Robert Haas wrote:

On May 10, 2012, at 4:19 PM, Andrew Dunstan  wrote:

On 05/10/2012 06:15 PM, Tom Lane wrote:

How about a hybrid: we continue to identify patch authors as now, that is with names 
attached to the feature/bugfix descriptions, and then have a separate section "Other 
Contributors" to recognize patch reviewers and other helpers?

works for me.

Me, too.

That does not work for me.  There is no practical reason for a list of
names to appear in the release notes.  I suggest if we want to do that
that we remove all names from the release notes (as Tom suggested), and
create a wiki for credit, and link to that from the release
announcement.  That would allow us to put company names in there too.



I gave you a reason. You might not agree with it but saying that it's no 
reason doesn't make it so. A wiki page will just be duplication, IMNSHO.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem/bug in drop tablespace?

2012-05-11 Thread Albe Laurenz
Michael Nolan wrote:
> I see one potential difference between your results and mine.
> 
> When I rebuild the tablespace, I wind up with the same filename/OID as
> before, I'm not sure you do.

Right. That's strange.
Usually OIDs get incremented, so you shouldn't end up with the same
OID for the new tablespace.

Can you provide a complete testcase?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "pgstat wait timeout" just got a lot more common on Windows

2012-05-11 Thread Tom Lane
Magnus Hagander  writes:
> On Thu, May 10, 2012 at 6:52 PM, Tom Lane  wrote:
>> Oh ... while hacking win32 PGSemaphoreLock I saw that it has a *seriously*
>> nasty bug: it does not reset ImmediateInterruptOK before returning.
>> How is it that Windows machines aren't falling over constantly?

> Hmm. the commit you made to fix it says it changes how
> ImmediateInterruptOK is handled, but there was not a single line of
> code that actually changed that? Or am I misreading this completely?

Exit is now out the bottom of the loop, not by a raw "return;".

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can pg_trgm handle non-alphanumeric characters?

2012-05-11 Thread MauMau

From: "Kevin Grittner" 

"MauMau"  wrote:

For information, what kind of breakage would occur?



I imagined removing KEEPONLYALNUM would just accept
non-alphanumeric characters and cause no harm to those who use
only alphanumeric characters.


This would break our current usages because of the handling of
trigrams at the "edges" of groups of qualifying characters.  It
would make similarity (and distance) values less useful for our
current name searches using it.  To simulate the effect, I used an
'8' in place of a comma instead of recompiling with the suggested
change.

test=# select show_trgm('smith,john');
show_trgm
---
{"  j","  s"," jo"," sm","hn ",ith,joh,mit,ohn,smi,"th "}
(1 row)

test=# select show_trgm('smith8john');
 show_trgm
-
{"  s"," sm",8jo,h8j,"hn ",ith,joh,mit,ohn,smi,th8}
(1 row)

test=# select similarity('smith,john', 'jon smith');
similarity

  0.615385
(1 row)

test=# select similarity('smith8john', 'jon smith');
similarity

0.3125
(1 row)

So making the proposed change unconditionally could indeed hurt
current users of the technique.  On the other hand, if there was
fine-grained control of this, it might make trigrams useful for
searching statute cites (using all characters) as well as names
(using the current character set); so I wouldn't want it to just be
controlled by a global GUC.


Thank you for your concise examples. I probably got it.

From your examples, I thought KEEPONLYALNUM controls whether 
non-alphanumeric characters are included in trigrams, though I haven't read 
the code of pg_trgm. So, removing KEEPONLYALNUM definition produces trigrams 
unnecessary for users who handle only alphanumeric text. That would lead to 
undesirable query results.


Then, I wonder what would be the ideal specification...to add 
alphanumeric/non-alphanumeric boolean switch to similarity() function, add 
non-alphanumeric version of operators (ex. %* and <->*) and non-alphanumeric 
version of operator classes (ex. gin_allchars_trgm_ops)? At least, I 
understood the fix is not appropriate for minor releases.


Regards
MauMau


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Bruce Momjian
On Thu, May 10, 2012 at 08:46:56PM -0700, Robert Haas wrote:
> On May 10, 2012, at 4:19 PM, Andrew Dunstan  wrote:
> > On 05/10/2012 06:15 PM, Tom Lane wrote:
> >> How about a hybrid: we continue to identify patch authors as now, that is 
> >> with names attached to the feature/bugfix descriptions, and then have a 
> >> separate section "Other Contributors" to recognize patch reviewers and 
> >> other helpers?
> > 
> > works for me.
> 
> Me, too.

That does not work for me.  There is no practical reason for a list of
names to appear in the release notes.  I suggest if we want to do that
that we remove all names from the release notes (as Tom suggested), and
create a wiki for credit, and link to that from the release
announcement.  That would allow us to put company names in there too.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-11 Thread Florian Pflug
> Florian Pflug  wrote:
>> On May10, 2012, at 18:36 , Kevin Grittner wrote:
>>> Robert Haas  wrote:
>>> 
 I wonder if you could do this with something akin to the Bitmap
 Heap Scan machinery. Populate a TID bitmap with a bunch of
 randomly chosen TIDs, fetch them all in physical order
 and if you don't get as many rows as you need, rinse and repeat
 until you do.
>>> 
>>> If you get too many, it is important that you read all the way to
>>> the end and then randomly omit some of them.  While a bit of a
>>> bother, that's pretty straightforward and should be pretty fast,
>>> assuming you're not, like, an order of magnitude high.
>> 
>> Why is that? From a statistical point of view it shouldn't matter
>> whether you pick N random samples, or pick M >= N random samples an
>> then randomly pick N from M. (random implying uniformly distributed
>> here).
> 
> That sounds to me like exactly what what Robert and I both said.
> While passing the heap with the bitmap, if you get to the number you
> want you don't stop there -- you read all of them ("M" in your
> parlance) and randomly drop M minus N of them.  Or, if you prefer, I
> guess you could *pick* N of them.  I don't see a logical difference.

What I meant to say was "and drop the last M minus N", not "and randomly
drop M minus N". Which, of course, makes my statement utterly wrong since
the tuples are sorted by TID so you'd penalize tuples with a larger TID.
Duh! Sorry for the noise, guys...

>>> But falling short is tougher; making up the difference could be an
>>> iterative process, which could always wind up with having you read
>>> all tuples in the table without filling your sample.
>> 
>> But the likelihood of that happening is extremely low, no?
> 
> That depends.  What if someone just did a mass delete and your
> statistics aren't yet up to date when they ask to pick a relatively
> large percentage of the rows.
> 
>> Unless the sampling percentage is very high
> 
> Or the statistics are not current.  I agree, this shouldn't happen
> often, but we can never know, going in, whether it *is* the case.
> You *could* always wind up needing to read the entire table, and
> still not hit the initially-targeted number of rows.  Now, arguably
> you could use data gleaned from each pass to adjust the target or
> inform the size of the next pass.  My point is that "we selected too
> few" is a lot more complicated that the "we selected too many" case.
> 
>> but that case isn't of much practical importance anyway.
> 
> It's important that it's handled in some sane way when it happens.
> And it will happen.

Hm. Maybe one can get rid of these sorts of problems by factoring in
the expected density of the table beforehand and simply accepting that
the results will be inaccurate if the statistics are outdated?

One could, for example, simply pick

  N := SamplingPercentage * MaxTuplesPerPage / AvgLiveTuplesPerPage

where

 AvgLiveTuplesPerPage := #Tuples / #Pages

random TIDs, fetch the live ones, and return them. I'm not totally sure
whether this approach is sensible to non-uniformity in the tuple to
line-pointer assignment, though.

>> But something else comes to mind. Does the standard permit samples
>> taken with the BERNOULLI method to contain the same tuple multiple
>> times?
> 
> I'm pretty sure not.  That would be nonsensical.
> 
>> If not, any kind of TID-based approach will have to all previously
>> fetched TIDs, which seems doable but unfortunate...
> 
> Right.  You would always need to ignore a duplicate random choice in
> any one cycle of generating ctid values; and if you are iterating
> because you fell short, you would also need to ignore values from
> previous iterations.  And OR your new bitmap against the previous
> ones to save for the next iteration, if needed.  I never said it
> couldn't be done; it's just very fussy, and you want to avoid a very
> large number of iterations in the case that someone deleted 99.99% of
> your rows right before you ran your sample and before autovacuum
> caught up.

Actually, thinking more about this, if the approach sketched above
turns out to work, then one might be able to get away without remembering
previously computed TIDs, thus removing a lot of complexity.

Say, for example, you simply start out with a single random TID tid[0].
The you produce the sequence of "random" TIDs by setting

  tid[n+1] = tid[n] + random(1 <= x <= 2*D-1)

where D is the expected distance between one TID from the sample set
and the next higher one, i.e. D = 2 * #TIDs / N. (You'd simply stop once
tid[n] >) #TIDs). This will give you approximately uniformly distributed
TIDs, I think, and will even return them in physical order. The 100$ question
is, by how much does this violate the independence requirement, i.e. how
far are P(tuple X picked) and P(tuple X picked | tuple Y picked) apart?
Some search through the literature should be able to answer that.

Should the dependency turn out to be too high, one might be able

Re: [HACKERS] unite recovery.conf and postgresql.conf

2012-05-11 Thread Simon Riggs
On 10 May 2012 05:44, Josh Berkus  wrote:
>
>> I expect to revisit config directories before the first 9.3 CF, it will
>> help multiple things I'd like to see happen.  Then we can circle back to
>> the main unification job with a fairly clear path forward from there.
>
> Yeah, let's discuss this next week.  "Easier configuration" is one
> demand I'm hearing from developers in general, and I don't think that's
> nearly as hard a feature as, say, parallel query.  We can do it.

A key requirement is to be able to drop in new config files without
needing to $EDIT anything.

OK, its possible to put in lots of includeifexists for non-existent
files just in case you need one, but that sucks.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Cédric Villemain
Le jeudi 10 mai 2012 22:18:30, Alvaro Herrera a écrit :
> It's been said elsewhere that adding all this to the release notes as
> found on the official docs would be too bulky.  How about having a
> second copy of the release notes that contains authorship info as
> proposed by Andrew?  Then the docs could have no names at all, and
> credit would be given by some other page in the website (to which the
> release notes would link).

Maybe we can update/upgrade the page [1] on the website and add per release 
authors/rewviewers/companies/... ?

[1] http://www.postgresql.org/community/contributors/
-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] checkpointer code behaving strangely on postmaster -T

2012-05-11 Thread Simon Riggs
On 10 May 2012 16:14, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Excerpts from Tom Lane's message of jue may 10 02:27:32 -0400 2012:
>>> Alvaro Herrera  writes:
>> I noticed while doing some tests that the checkpointer process does not
>> recover very nicely after a backend crashes under postmaster -T
>
>> It seems to me that the bug is in the postmaster state machine rather
>> than checkpointer itself.  After a few false starts, this seems to fix
>> it:
>
>> --- a/src/backend/postmaster/postmaster.c
>> +++ b/src/backend/postmaster/postmaster.c
>> @@ -2136,6 +2136,8 @@ pmdie(SIGNAL_ARGS)
>>                     signal_child(WalWriterPID, SIGTERM);
>>                 if (BgWriterPID != 0)
>>                     signal_child(BgWriterPID, SIGTERM);
>> +               if (FatalError && CheckpointerPID != 0)
>> +                   signal_child(CheckpointerPID, SIGUSR2);
>
> Surely we do not want the checkpointer doing a shutdown checkpoint here.
> If we need it to die immediately, SIGQUIT is the way.  If we want a
> shutdown checkpoint, that has to wait till after everything else is
> known dead.  So while I agree this may be a state machine bug, that
> doesn't look like a good fix.

Is this now fixed? You've made a few changes so I'm confused. Thanks.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "pgstat wait timeout" just got a lot more common on Windows

2012-05-11 Thread Magnus Hagander
On Thu, May 10, 2012 at 6:52 PM, Tom Lane  wrote:
> I wrote:
>> Hence I think we oughta swap the order of those two array
>> elements.  (Same issue in PGSemaphoreLock, btw, and I'm suspicious of
>> pgwin32_select.)
>
> Oh ... while hacking win32 PGSemaphoreLock I saw that it has a *seriously*
> nasty bug: it does not reset ImmediateInterruptOK before returning.
> How is it that Windows machines aren't falling over constantly?

Hmm. the commit you made to fix it says it changes how
ImmediateInterruptOK is handled, but there was not a single line of
code that actually changed that? Or am I misreading this completely?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-11 Thread Simon Riggs
On 10 May 2012 20:51, Andres Freund  wrote:

> I noticed that when synchronous_commit=off were not waking up the wal sender
> latch in xact.c:RecordTransactionCommit which leads to ugly delays of approx 7
> seconds (1 + replication_timeout/10) with default settings.
> Given that were flushing the wal to disk much sooner this appears to be a bad
> idea - especially as this may happen even under load if we ever reach the
> 'coughtup' state.

Sounds like a problem. I'll have a look.

> I wonder why the WalSndWakeup isn't done like:
>
> diff --git a/src/backend/access/transam/xlog.c
> b/src/backend/access/transam/xlog.c
> index ecb71b6..7a3224b 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -1906,6 +1906,10 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool
> xlog_switch)
>            xlogctl->LogwrtRqst.Flush = LogwrtResult.Flush;
>        SpinLockRelease(&xlogctl->info_lck);
>    }
> +
> +   /* the walsender wasn't woken up in xact.c */
> +   if(max_wal_senders > 1 && synchronous_commit == SYNCHRONOUS_COMMIT_OFF)
> +       WalSndWakeup();
>  }
>
> Doing that for the synchronous_commit=off case can imo be considered a bugfix,
> but I wonder why we ever wake the senders somewhere else?
> The only argument I can see for doing it at places like StartTransactionCommit
> is that thats the place after which the data will be visible on the client. I
> think thats a non-argument though because if wal is flushed to disk outside of
> a commit there normally is enough data to make it worthwile.
>
> Doing the above results in a very noticeable reduction in lagginess and even a
> noticeable reduction in cpu-usage spikes on a busy replication test setup.
>
> Greetings,
>
> Andres
>
> --
>  Andres Freund                     http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Magnus Hagander
On Fri, May 11, 2012 at 9:55 AM, Peter Eisentraut  wrote:
> On fre, 2012-05-11 at 09:26 +0200, Magnus Hagander wrote:
>> On Thu, May 10, 2012 at 6:28 PM, Peter Eisentraut  wrote:
>> > On tor, 2012-05-10 at 17:31 +0200, Magnus Hagander wrote:
>> >> If people want the main docs building more often that's not really a
>> >> problem other than time - we just need to decouple it from the
>> >> buildfarm and run a separate job for it. It's not rocket science..
>> >
>> > Many years ago, Bruce and myself in particular put in a lot of work to
>> > make the turnaround time on the docs build less than 5 minutes, based on
>> > various requests.  I'm disappointed to learn that that was abandoned
>> > without discussion.  We might as well just put the old job back.
>>
>> It was not "abandoned without discussion" in any way.
>>
>> First of all, the docs still build in 5 minutes.
>
> That is different from the turnaround time from the commit.
>
>> Second, the "5 minutes docs build" link on the website was removed in
>> *2007*. At the request of Bruce, who maintained it. This request was
>> (at least according to the commit message and form what I can
>> remember) made in public on pgsql-www, and thus clearly open for
>> discussion. At http://archives.postgresql.org/pgsql-www/2007-12/msg00212.php.
>> Where Bruce claims the other one runs often enough, and nobody
>> objects.
>
> You are misinterpreting this.  The reason Bruce's link was removed was
> that the other (official) build was set to run at the same frequency, so
> Bruce's build was exactly redundant.  The requirement/aspiration to have
> a few minutes turnaround time continued.

But the other (official) build was *not* set to run at the same
frequency. It was set, according to that mail, to run frequently
enough, but it did not run every 5 minutes. at least not the only
cronjob I found back then.


>> Third, the regular docs build on the developer box (which I think ran
>> once / hour?) *did not work* (prior to that it kind of work but often
>> hung and failed, but at least it tried to run - at this point it
>> stopped even trying).
>
> If you had any problems with how well they worked, we could have
> discussed this.  It's fine if you want to change how they run, and I
> have no problem with how they are run now, but I just want to make clear
> what requirements led to the setup at the time.

The entire machine they ran on *died*. Because it had been
unmaintained for many years. and parts was silently upgraded where as
other, incompatible, parts were not. We did actually leave the script
around. It ran for months, failing at step one, and pretty much nobody
complained.

The docs build was *entirely* undocumented (other than the official
cronjob which did *not* run every 5 minutes, but I guess you are
saying there was a second, undocumented, cronjob that ran more often).


But in the interest of actually being productive - what *is* the
usecase for needing a 5 minute turnaround time? I don't buy the "check
what a patch looks like", because that should be done *before* the
commit, not after - so it's best verified by a local docs build anyway
(which will also be faster).

I'm sure we can put something in with a pretty quick turnaround again
without too much strain on the system, but it does, as I mentioned
before, require decoupling it from the buildfarm which means it's not
just tweaking a config file.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python result set slicing broken in Python 3

2012-05-11 Thread Jan Urbański

On 10/05/12 19:45, Peter Eisentraut wrote:

On lör, 2012-05-05 at 22:45 +0200, Jan Urbański wrote:

Apparently once you implement PyMappingMethods.mp_subscript you can
drop PySequenceMethods.sq_slice, but I guess there's no harm in
keeping it (and I'm not sure it'd work on Python 2.3 with only
mp_subscript implemented).


Committed this now.

 From test coverage reports, I now see that PLy_result_ass_item() is no
longer called.  That's probably OK, if assignments are now handled
through the mapping methods.  But should we remove the function then?


Have you tried on Python 2.3 as well? People on #python said that if you 
implement the mapping functions, the sequence slicing functions are no 
longer used, but maybe we should revisit for the next release, rather 
than risk introducing a regression for the benefit of removing a few 
dead lines.


Cheers,
Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Peter Eisentraut
On fre, 2012-05-11 at 09:26 +0200, Magnus Hagander wrote:
> On Thu, May 10, 2012 at 6:28 PM, Peter Eisentraut  wrote:
> > On tor, 2012-05-10 at 17:31 +0200, Magnus Hagander wrote:
> >> If people want the main docs building more often that's not really a
> >> problem other than time - we just need to decouple it from the
> >> buildfarm and run a separate job for it. It's not rocket science..
> >
> > Many years ago, Bruce and myself in particular put in a lot of work to
> > make the turnaround time on the docs build less than 5 minutes, based on
> > various requests.  I'm disappointed to learn that that was abandoned
> > without discussion.  We might as well just put the old job back.
> 
> It was not "abandoned without discussion" in any way.
> 
> First of all, the docs still build in 5 minutes.

That is different from the turnaround time from the commit.

> Second, the "5 minutes docs build" link on the website was removed in
> *2007*. At the request of Bruce, who maintained it. This request was
> (at least according to the commit message and form what I can
> remember) made in public on pgsql-www, and thus clearly open for
> discussion. At http://archives.postgresql.org/pgsql-www/2007-12/msg00212.php.
> Where Bruce claims the other one runs often enough, and nobody
> objects.

You are misinterpreting this.  The reason Bruce's link was removed was
that the other (official) build was set to run at the same frequency, so
Bruce's build was exactly redundant.  The requirement/aspiration to have
a few minutes turnaround time continued.

> Third, the regular docs build on the developer box (which I think ran
> once / hour?) *did not work* (prior to that it kind of work but often
> hung and failed, but at least it tried to run - at this point it
> stopped even trying).

If you had any problems with how well they worked, we could have
discussed this.  It's fine if you want to change how they run, and I
have no problem with how they are run now, but I just want to make clear
what requirements led to the setup at the time.

> So where in all this was anything "abandoned"?

The ability to get a docs build in less than 5 minutes after commit.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-11 Thread Magnus Hagander
On Thu, May 10, 2012 at 6:28 PM, Peter Eisentraut  wrote:
> On tor, 2012-05-10 at 17:31 +0200, Magnus Hagander wrote:
>> If people want the main docs building more often that's not really a
>> problem other than time - we just need to decouple it from the
>> buildfarm and run a separate job for it. It's not rocket science..
>
> Many years ago, Bruce and myself in particular put in a lot of work to
> make the turnaround time on the docs build less than 5 minutes, based on
> various requests.  I'm disappointed to learn that that was abandoned
> without discussion.  We might as well just put the old job back.

It was not "abandoned without discussion" in any way.

First of all, the docs still build in 5 minutes.

Second, the "5 minutes docs build" link on the website was removed in
*2007*. At the request of Bruce, who maintained it. This request was
(at least according to the commit message and form what I can
remember) made in public on pgsql-www, and thus clearly open for
discussion. At http://archives.postgresql.org/pgsql-www/2007-12/msg00212.php.
Where Bruce claims the other one runs often enough, and nobody
objects.

Third, the regular docs build on the developer box (which I think ran
once / hour?) *did not work* (prior to that it kind of work but often
hung and failed, but at least it tried to run - at this point it
stopped even trying). The current docs build replaced the case when we
had *no developer docs updates at all*, by taking the reasonably quick
and easy fix to run it as part of an existing buildfarm animal and
upload the results.


So where in all this was anything "abandoned"?


Bruce already suggested putting the old job back on his box, which
he's of course free to do. But since it took almost 5 years before
anybody actually complained about that, maybe it's not really that big
a problem?

And it's already been agreed that increasing the dev docs build back
up to maybe once an hour would make sense.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers