date:20110225

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Daniel Farina

On Wed, Feb 23, 2011 at 11:49 AM, Greg Smith g...@2ndquadrant.com wrote:
 Robert Haas wrote:

 2. Synchronous replication.  Splitting up this patch has allowed some
 On top of 4 listed reviewers I know Dan Farina is poking at the last update,
 so we may see one more larger report on top of what's already shown up.  And
 Jaime keeps kicking the tires too.  What Simon was hoping is that a week of
 others looking at this would produce enough feedback that it might be
 possible to sweep the remaining issues up soon after he's back.  It looks to
 me like that's about when everything else that's still open will probably
 settle too.

Besides some of the fixable issues, I am going to have to echo
Robert's sentiments about a few kinks that go beyond mechanism in the
syncrep patch: in particular, it will *almost* solve the use case I
was hoping to solve: a way to cleanly perform planned switchovers
between machines with minimal downtime and no lost data. But there are
a couple of holes I have thought of so far:

1. The 2-safe methodology supported is not really compatible with
performing planned-HA-switchover of a cluster with its own syncrep
guarantees on top of that. For example:

Server A syncreps to Server B

Now I want to provision server A-prime, which will eventually take the
place of A.

Server A syncreps to Server B
Server A syncreps to Server A-prime

Right now, as it stands, the syncrep patch will be happy as soon as
the data has been fsynced to either B or A-prime; I don't think we can
guarantee at any point that A-prime can become the leader, and feed B.

2. The unprivileged user can disable syncrep, in any situation. This
flexibility is *great*, but you don't really want people to do it when
one is performing the switchover. Rather, in a magical world we'd hope
that disabling syncrep would just result in not having to
synchronously commit to B (but, in this case, still synchronously
commit to A-prime)

In other words, to my mind, you can use syncrep as-is to provide
2-safe durability xor a scheduled switchover: as soon as someone wants
both, I think they'll have some trouble. I do want both, though.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Robert Haas

On Fri, Feb 25, 2011 at 3:14 AM, Daniel Farina dan...@heroku.com wrote:
 On Wed, Feb 23, 2011 at 11:49 AM, Greg Smith g...@2ndquadrant.com wrote:
 Robert Haas wrote:

 2. Synchronous replication.  Splitting up this patch has allowed some
 On top of 4 listed reviewers I know Dan Farina is poking at the last update,
 so we may see one more larger report on top of what's already shown up.  And
 Jaime keeps kicking the tires too.  What Simon was hoping is that a week of
 others looking at this would produce enough feedback that it might be
 possible to sweep the remaining issues up soon after he's back.  It looks to
 me like that's about when everything else that's still open will probably
 settle too.

 Besides some of the fixable issues, I am going to have to echo
 Robert's sentiments about a few kinks that go beyond mechanism in the
 syncrep patch: in particular, it will *almost* solve the use case I
 was hoping to solve: a way to cleanly perform planned switchovers
 between machines with minimal downtime and no lost data. But there are
 a couple of holes I have thought of so far:

Well, just because the patch doesn't solve every use case isn't a
reason not to go forward with it - we can always add more options
later - but I have to admit that I'm kind of alarmed about the number
of bugs reported so far.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas

On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian br...@momjian.us wrote:
 Josh Berkus wrote:
 On 2/23/11 7:10 AM, Robert Haas wrote:
  IME, most bad query plans are caused by either incorrect
  estimates of selectivity, or wrongheaded notions about what's likely
  to be cached.  If we could find a way, automated or manual, of
  providing the planner some better information about the facts of life
  in those areas, I think we'd be way better off.  I'm open to ideas
  about what the best way to do that is.

 As previously discussed, I'm fine with approaches which involve
 modifying database objects.  These are auditable and centrally managable
 and aren't devastating to upgrades.

 So thinks like the proposed CREATE SELECTIVITY would be OK in a way
 that decorating queries would not.

 Similiarly, I would love to be able to set cache % on a per-relation
 basis, and override the whole dubious calculation involving
 random_page_cost for scans of that table.

 We should just fine a way of checking what percentage of a table is
 already in the shared buffers.  That doesn't help us with the kernel
 cache, but it would be a good start and something that doesn't require
 user tuning.

You're reinventing a wheel that's already been discarded multiple
times.  There are at least four separate problems:

1. The percentage of the table which is cached in shared_buffers at
plan time need not match the percentage that is cached at execution
time.  A delay of even a few seconds between planning and execution
could make the numbers totally different, and plans can be cached for
much longer than that.

2. Because shared_buffers can turn over quite quickly, planning the
statement multiple times in relatively quick succession could give
different results each time.  Previous discussions on this topic have
concluded that DBAs hate plan instability, and hate GEQO because it
causes plan instability, and this would inject plan instabiilty into
the main planner.

3. The percentage of the table which is cached in shared_buffers is
not necessarily representative of the percentage which is cached in
general.  On a large machine, shared_buffers may be less than 10% of
the total cache.  It would be unwise to make guesses about what is and
is not cached based on a small percentage of the cache.

4. Even if we could accurately estimate the percentage of the table
that is cached, what then?  For example, suppose that a user issues a
query which retrieves 1% of a table, and we know that 1% of that table
is cached.  How much of the data that the user asked for is cache?
Hard to say, right?  It could be none of it or all of it.  The second
scenario is easy to imagine - just suppose the query's been executed
twice.  The first scenario isn't hard to imagine either.

One idea Tom and I kicked around previously is to set an assumed
caching percentage for each table based on its size relative to
effective_cache_size - in other words, assume that the smaller a table
is, the more of it will be cached.  Consider a system with 8GB of RAM,
and a table which is 64kB.  It is probably unwise to make any plan
based on the assumption that that table is less than fully cached.  If
it isn't before the query executes, it soon will be.  Going to any
amount of work elsewhere in the plan to avoid the work of reading that
table in from disk is probably a dumb idea.  Of course, one downside
of this approach is that it doesn't know which tables are hot and
which tables are cold, but it would probably still be an improvement
over the status quo.

All that having been said, I think that while Josh is thinking fuzzily
about the mathematics of his proposal, the basic idea is pretty
sensible.  It is not easy - likely not possible - for the system to
have a good idea which things will be in some kind of cache at the
time the query is executed; it could even change mid-query.  The
execution of one part of the query could evict from the cache data
which some other part of the plan assumed would be cached.  But DBAs
frequently have a very good idea of which stuff is in cache - they can
make observations over a period of time and then adjust settings and
then observe some more and adjust some more.

PostgreSQL is extremely easy to administer compared with some of its
competitors, and it's frequently necessary to change very little.  But
there's a difference between what you absolutely have to change to
make it work and what you have the option to change when necessary.
We need to decrease the amount of stuff in the first category (as we
recently did with wal_buffers) and increase the amount of stuff in the
second category.  People coming from Oracle are not favorably
impressed either by the amount of monitoring data PostgreSQL can
gather or by the number of knobs that are available to fix problems
when they occur.  We don't need to have as many knobs as Oracle and we
probably don't want to, and for that matter we probably couldn't if we
did want to for lack of manpower, but

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread marcin mank

On Fri, Feb 25, 2011 at 9:14 AM, Daniel Farina dan...@heroku.com wrote:

 Right now, as it stands, the syncrep patch will be happy as soon as
 the data has been fsynced to either B or A-prime; I don't think we can
 guarantee at any point that A-prime can become the leader, and feed B.


- start A` up, replicating from A
- shutdown B (now A nad A` are synchronous)
now real quick:
- shut down A
- shut down A`
-change configuration
-start up A`
-start up B

Doesn`t this work?

Greetings
Marcin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Tom Lane

I had what seems to me a remarkably good idea, though maybe someone else
can spot a problem with it.  Given that we've decided to run the
modifying sub-queries all with the same command counter ID, they are
logically executing in parallel.  The current implementation takes no
advantage of that fact, though: it's based around the idea of running
the updates strictly sequentially.  I think we should change it so that
the updates happen physically, not only logically, concurrently.
Specifically, I'm imagining getting rid of the patch's additions to
InitPlan and ExecutePlan that find all the modifying sub-queries and
force them to be cycled to completion before the main plan runs.
Just run the main plan and let it pull tuples from the CTEs as needed.
Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to
completion before shutting down the plan.  (In the event of an error,
we'd never get to ExecutorEnd, but it doesn't matter since whatever
updates we did apply are nullified anyhow.)

This has a number of immediate and future implementation benefits:

1. RETURNING tuples that aren't actually needed by the main plan
don't need to be buffered anywhere.  (ExecutorEnd would just pull
directly from the ModifyTable nodes, ignoring their parent CTE
nodes, in all cases.)

2. In principle, in many common cases the RETURNING tuples wouldn't have
to be buffered at all, but could be consumed on-the-fly.  I think that
right now the CTEScan nodes might still buffer the tuples so they can
regurgitate them in case of being rescanned, but it's not hard to see
how that could be improved later if it doesn't work immediately.

3. The code could be significantly simpler.  Instead of that rather
complex and fragile logic in InitPlan to try to locate all the
ModifyTable nodes and their CTEScan parents, we could just have
ModifyTable nodes add themselves to a list in the EState during
ExecInitNode.  Then ExecutorEnd just traverses that list.

However, the real reason for doing it isn't any of those, but rather
to establish the principle that the executions of the modifying
sub-queries are interleaved not sequential.  We're never going to be
able to do any significant optimization of such queries if we have to
preserve the behavior that the sub-queries execute sequentially.
And I think it's inevitable that users will manage to build such an
assumption into their queries if the first release with the feature
behaves that way.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Robert Haas

On Fri, Feb 25, 2011 at 9:58 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I had what seems to me a remarkably good idea, though maybe someone else
 can spot a problem with it.  Given that we've decided to run the
 modifying sub-queries all with the same command counter ID, they are
 logically executing in parallel.  The current implementation takes no
 advantage of that fact, though: it's based around the idea of running
 the updates strictly sequentially.  I think we should change it so that
 the updates happen physically, not only logically, concurrently.
 Specifically, I'm imagining getting rid of the patch's additions to
 InitPlan and ExecutePlan that find all the modifying sub-queries and
 force them to be cycled to completion before the main plan runs.
 Just run the main plan and let it pull tuples from the CTEs as needed.
 Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to
 completion before shutting down the plan.  (In the event of an error,
 we'd never get to ExecutorEnd, but it doesn't matter since whatever
 updates we did apply are nullified anyhow.)

 This has a number of immediate and future implementation benefits:

 1. RETURNING tuples that aren't actually needed by the main plan
 don't need to be buffered anywhere.  (ExecutorEnd would just pull
 directly from the ModifyTable nodes, ignoring their parent CTE
 nodes, in all cases.)

 2. In principle, in many common cases the RETURNING tuples wouldn't have
 to be buffered at all, but could be consumed on-the-fly.  I think that
 right now the CTEScan nodes might still buffer the tuples so they can
 regurgitate them in case of being rescanned, but it's not hard to see
 how that could be improved later if it doesn't work immediately.

 3. The code could be significantly simpler.  Instead of that rather
 complex and fragile logic in InitPlan to try to locate all the
 ModifyTable nodes and their CTEScan parents, we could just have
 ModifyTable nodes add themselves to a list in the EState during
 ExecInitNode.  Then ExecutorEnd just traverses that list.

 However, the real reason for doing it isn't any of those, but rather
 to establish the principle that the executions of the modifying
 sub-queries are interleaved not sequential.  We're never going to be
 able to do any significant optimization of such queries if we have to
 preserve the behavior that the sub-queries execute sequentially.
 And I think it's inevitable that users will manage to build such an
 assumption into their queries if the first release with the feature
 behaves that way.

 Comments?

I completely agree.  Actually, I thought we had already agreed on the
design you just proposed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 
 Given that we've decided to run the modifying sub-queries all with
 the same command counter ID, they are logically executing in
 parallel.
 
 Just run the main plan and let it pull tuples from the CTEs as
 needed.
 
On the face of it, that sounds like it has another benefit you
didn't mention -- it sounds like it's much more conducive to
allowing parallel processing, if (when?) we eventually move in that
direction.  It might even be a good case for an initial, limited
implementation.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Yeb Havinga


On 2011-02-22 20:43, Jaime Casanova wrote:


you can make this happen more easily, i just run pgbench -n -c10 -j10
test and qot that warning and sometimes a segmentation fault and
sometimes a failed assertion

and the problematic code starts at
src/backend/replication/syncrep.c:277, here my suggestions on that
code.
still i get a failed assertion because of the second Assert (i think
we should just remove that one)

*** SyncRepRemoveFromQueue(void)
*** 288,299 

 if (proc-lwWaitLink == NULL)
 elog(WARNING, could not locate
ourselves on wait queue);
!   proc = proc-lwWaitLink;
 }

 if (proc-lwWaitLink == NULL)   /* At tail */
 {
!   Assert(proc == MyProc);
 /* Remove ourselves from tail of queue */
 Assert(queue-tail == MyProc);
 queue-tail = proc;
--- 288,300 

 if (proc-lwWaitLink == NULL)
 elog(WARNING, could not locate
ourselves on wait queue);
!   else
!   proc = proc-lwWaitLink;
 }

 if (proc-lwWaitLink == NULL)   /* At tail */
 {
!   Assert(proc != MyProc);
 /* Remove ourselves from tail of queue */
 Assert(queue-tail == MyProc);
 queue-tail = proc;

I also did some initial testing on this patch and got the queue related 
errors with  1 clients. With the code change from Jaime above I still 
got a lot of 'not on queue warnings'.


I tried to understand how the queue was supposed to work - resulting in 
the changes below that also incorporates a suggestion from Fujii 
upthread, to early exit when myproc was found.


With the changes below all seems to work without warnings. I now see 
that the note about the list invariant is too short, better was: if 
queue length = 1 then head = tail


--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -274,6 +274,8 @@ SyncRepRemoveFromQueue(void)
}
else
{
+   bool found = false;
+
while (proc-lwWaitLink != NULL)
{
/* Are we the next proc in our traversal of the 
queue? */

@@ -284,17 +286,19 @@ SyncRepRemoveFromQueue(void)
 * No need to touch head or tail.
 */
proc-lwWaitLink = MyProc-lwWaitLink;
+   found = true;
+   break;
}

-   if (proc-lwWaitLink == NULL)
-   elog(WARNING, could not locate 
ourselves on wait queue);

proc = proc-lwWaitLink;
}
+   if (!found)
+   elog(WARNING, could not locate ourselves on 
wait queue);


-   if (proc-lwWaitLink == NULL)   /* At tail */
+   /* If MyProc was removed from the tail, maintain list 
invariant head==tail */

+   if (proc-lwWaitLink == NULL)
{
-   Assert(proc == MyProc);
-   /* Remove ourselves from tail of queue */
+   Assert(proc != MyProc); /* impossible since that 
is the head=MyProc branch above */

Assert(queue-tail == MyProc);
queue-tail = proc;
proc-lwWaitLink = NULL;

I needed to add this to make the documentation compile

--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF;
 You should also consider setting varnamehot_standby_feedback/
 as an alternative to using this parameter.
/para
+ /listitem
+ /varlistentry
+ /variablelist/sect2

sect2 id=runtime-config-sync-rep

regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Tom Lane

Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Given that we've decided to run the modifying sub-queries all with
 the same command counter ID, they are logically executing in
 parallel.
 
 Just run the main plan and let it pull tuples from the CTEs as
 needed.
 
 On the face of it, that sounds like it has another benefit you
 didn't mention -- it sounds like it's much more conducive to
 allowing parallel processing, if (when?) we eventually move in that
 direction.  It might even be a good case for an initial, limited
 implementation.

Yeah.  Most of the executor is in principle parallelizable at the
plan-node level (ignoring the obvious and severe implementation
problems with parallelizing *anything* in the backend).  It's not
good for wCTE to be creating a user-visible assumption that certain
things will happen in a predefined order.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Marko Tiikkaja


On 2011-02-25 4:58 PM, Tom Lane wrote:

Specifically, I'm imagining getting rid of the patch's additions to
InitPlan and ExecutePlan that find all the modifying sub-queries and
force them to be cycled to completion before the main plan runs.
Just run the main plan and let it pull tuples from the CTEs as needed.
Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to
completion before shutting down the plan.  (In the event of an error,
we'd never get to ExecutorEnd, but it doesn't matter since whatever
updates we did apply are nullified anyhow.)


This idea has actually been discussed before when we talked about 
optimizing wCTEs, but IIRC you said that doing this in ExecutorEnd is a 
bit ugly.  But if you can write this idea down in a way that makes you 
happy with the implementation, I think it's a huge benefit and we should 
definitely do it.



This has a number of immediate and future implementation benefits:



3. The code could be significantly simpler.  Instead of that rather
complex and fragile logic in InitPlan to try to locate all the
ModifyTable nodes and their CTEScan parents, we could just have
ModifyTable nodes add themselves to a list in the EState during
ExecInitNode.  Then ExecutorEnd just traverses that list.


Sounds good to me.


However, the real reason for doing it isn't any of those, but rather
to establish the principle that the executions of the modifying
sub-queries are interleaved not sequential.  We're never going to be
able to do any significant optimization of such queries if we have to
preserve the behavior that the sub-queries execute sequentially.
And I think it's inevitable that users will manage to build such an
assumption into their queries if the first release with the feature
behaves that way.


Yeah, you might be right.


Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2011-02-25 Thread Tom Lane

Marko Tiikkaja marko.tiikk...@cs.helsinki.fi writes:
 On 2011-02-25 1:36 AM, Tom Lane wrote:
 Why is it necessary to hack the portal logic at all?  The patch seems to
 work for me without that.  (I've fixed quite a few bugs though, so maybe
 what this is really doing is masking a problem elsewhere.)

 Without hacking it broke when PQdescribePrepared was called on a 
 prepared query like:

 WITH t AS (DELETE FROM foo)
 SELECT 1;

 Not sure if that's an actual problem, but it seemed like something worht 
 fixing.

I can't replicate such a problem here --- do you have a concrete test
case?  ISTM the issue would only have been a problem back when you
were trying to generate multiple PlannedStmts from a query like the
above.  The current implementation with everything in one plantree
really ought to look just like a SELECT so far as the portal code
is concerned.

 Also, why are we forbidding wCTEs in cursors?  Given the current
 definitions, that case seems to work fine too: the wCTEs will be
 executed as soon as you fetch something from the cursor.  Are you
 just worried about not allowing a case that might be hard to support
 later?

 Honestly, I have no idea.  It might be a leftover from the previous 
 design.  If it looks like it's easy to support, then go for it.

Right now I'm thinking that it is best to continue to forbid it.
If we go over to the less-sequential implementation that I'm advocating
in another thread, the timing of the updates would become a lot less
predictable than I say above.  If we refuse it for now, we can always
remove the restriction later, but the other way is more painful.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2011-02-25 Thread Marko Tiikkaja


On 2011-02-25 6:12 PM, Tom Lane wrote:

Marko Tiikkajamarko.tiikk...@cs.helsinki.fi  writes:

Without hacking it broke when PQdescribePrepared was called on a
prepared query like:



WITH t AS (DELETE FROM foo)
SELECT 1;



Not sure if that's an actual problem, but it seemed like something worht
fixing.


I can't replicate such a problem here --- do you have a concrete test
case?  ISTM the issue would only have been a problem back when you
were trying to generate multiple PlannedStmts from a query like the
above.


I don't have one right now (I lost the one I had because of a hardware 
failure in a virtual machine), but I can write you one if you want to. 
But see below.



The current implementation with everything in one plantree
really ought to look just like a SELECT so far as the portal code
is concerned.


The problem was that the old code was using PORTAL_MULTI_QUERY whenever 
a wCTE was present.  Are you saying that you are using 
PORTAL_ONE_SELECT?  Doesn't that have problems with triggers, for example?



Also, why are we forbidding wCTEs in cursors?  Given the current
definitions, that case seems to work fine too: the wCTEs will be
executed as soon as you fetch something from the cursor.  Are you
just worried about not allowing a case that might be hard to support
later?



Honestly, I have no idea.  It might be a leftover from the previous
design.  If it looks like it's easy to support, then go for it.


Right now I'm thinking that it is best to continue to forbid it.
If we go over to the less-sequential implementation that I'm advocating
in another thread, the timing of the updates would become a lot less
predictable than I say above.  If we refuse it for now, we can always
remove the restriction later, but the other way is more painful.


Fair enough.


Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sync rep design architecture (was disposition of remaining patches)

2011-02-25 Thread Greg Smith


Daniel Farina wrote:

Server A syncreps to Server B

Now I want to provision server A-prime, which will eventually take the
place of A.

Server A syncreps to Server B
Server A syncreps to Server A-prime

Right now, as it stands, the syncrep patch will be happy as soon as
the data has been fsynced to either B or A-prime; I don't think we can
guarantee at any point that A-prime can become the leader, and feed B.
  


One of the very fundamental breaks between how this patch implements 
sync rep and what some people might expect is this concern.  Having such 
tight control over the exact order of failover isn't quite here yet, so 
sometimes people will need to be creative to work within the 
restrictions of what is available.  The path for this case is probably:


1) Wait until A' is caught up
2) Switchover to B as the right choice to be the new master, with A' as 
its standby and A going off-line at the same time.
3) Switchover the master role from B to A'.  Bring up B as its standby. 


There are other possible transition plans available too.

I appreciate that you would like to do this as an atomic operation, 
rather than handling it as two steps--one of which puts you in a middle 
point where B, a possibly inferior standby, is operating at the master.  
There are a dozen other complicated my use case says I want X and it 
must be done as Y requests for Sync Rep floating around here, too.  
They're all getting ignored in favor of something smaller that can get 
built today. 

The first question I'd ask is whether you could you settle for this more 
cumbersome than you'd prefer switchover plan for now.  The second is 
whether implementing what this feature currently does would get in the 
way of coding of what you really want eventually. 

I didn't get the Streaming Rep + Hot Standby features I wanted in 9.0 
either.  But committing what was reasonable to include in that version 
let me march forward with very useful new code, doing another year of 
development on my own projects and getting some new things get fixed in 
core.  And so far it looks like 9.1 will sort out all of the kinks I was 
unhappy about.  The same sort of thing will need to happen to get Sync 
Rep committed and then appropriate for more use cases.  There isn't any 
margin left for discussions of scope creep left here; really it's is 
this subset useful for some situations and stable enough to commit now.



2. The unprivileged user can disable syncrep, in any situation. This
flexibility is *great*, but you don't really want people to do it when
one is performing the switchover.


For the moment you may have to live with a situation where user 
connections must be blocked during the brief moment of switchover to 
eliminate this issue.  That's what I end up doing with 9.0 production 
systems to get a really clean switchover, there's a second of hiccup 
even in the best case.  I'm not sure yet of the best way yet to build a 
UI to make that more transparent in the sync rep case.  It's sure not a 
problem that's going to get solved in this release though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Cédric Villemain

2011/2/25 Robert Haas robertmh...@gmail.com:
 On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian br...@momjian.us wrote:
 Josh Berkus wrote:
 On 2/23/11 7:10 AM, Robert Haas wrote:
  IME, most bad query plans are caused by either incorrect
  estimates of selectivity, or wrongheaded notions about what's likely
  to be cached.  If we could find a way, automated or manual, of
  providing the planner some better information about the facts of life
  in those areas, I think we'd be way better off.  I'm open to ideas
  about what the best way to do that is.

 As previously discussed, I'm fine with approaches which involve
 modifying database objects.  These are auditable and centrally managable
 and aren't devastating to upgrades.

 So thinks like the proposed CREATE SELECTIVITY would be OK in a way
 that decorating queries would not.

 Similiarly, I would love to be able to set cache % on a per-relation
 basis, and override the whole dubious calculation involving
 random_page_cost for scans of that table.

 We should just fine a way of checking what percentage of a table is
 already in the shared buffers.  That doesn't help us with the kernel
 cache, but it would be a good start and something that doesn't require
 user tuning.

 You're reinventing a wheel that's already been discarded multiple
 times.  There are at least four separate problems:

 1. The percentage of the table which is cached in shared_buffers at
 plan time need not match the percentage that is cached at execution
 time.  A delay of even a few seconds between planning and execution
 could make the numbers totally different, and plans can be cached for
 much longer than that.

 2. Because shared_buffers can turn over quite quickly, planning the
 statement multiple times in relatively quick succession could give
 different results each time.  Previous discussions on this topic have
 concluded that DBAs hate plan instability, and hate GEQO because it
 causes plan instability, and this would inject plan instabiilty into
 the main planner.

 3. The percentage of the table which is cached in shared_buffers is
 not necessarily representative of the percentage which is cached in
 general.  On a large machine, shared_buffers may be less than 10% of
 the total cache.  It would be unwise to make guesses about what is and
 is not cached based on a small percentage of the cache.

 4. Even if we could accurately estimate the percentage of the table
 that is cached, what then?  For example, suppose that a user issues a
 query which retrieves 1% of a table, and we know that 1% of that table
 is cached.  How much of the data that the user asked for is cache?
 Hard to say, right?  It could be none of it or all of it.  The second
 scenario is easy to imagine - just suppose the query's been executed
 twice.  The first scenario isn't hard to imagine either.

 One idea Tom and I kicked around previously is to set an assumed
 caching percentage for each table based on its size relative to
 effective_cache_size - in other words, assume that the smaller a table
 is, the more of it will be cached.  Consider a system with 8GB of RAM,
 and a table which is 64kB.  It is probably unwise to make any plan
 based on the assumption that that table is less than fully cached.  If
 it isn't before the query executes, it soon will be.  Going to any
 amount of work elsewhere in the plan to avoid the work of reading that
 table in from disk is probably a dumb idea.  Of course, one downside
 of this approach is that it doesn't know which tables are hot and
 which tables are cold, but it would probably still be an improvement
 over the status quo.

Yes, good idea.


 All that having been said, I think that while Josh is thinking fuzzily
 about the mathematics of his proposal, the basic idea is pretty
 sensible.  It is not easy - likely not possible - for the system to
 have a good idea which things will be in some kind of cache at the
 time the query is executed; it could even change mid-query.  The
 execution of one part of the query could evict from the cache data
 which some other part of the plan assumed would be cached.  But DBAs
 frequently have a very good idea of which stuff is in cache - they can
 make observations over a period of time and then adjust settings and
 then observe some more and adjust some more.

I believe we can maintain a small map of area of a relation  which are
in the OS buffer cache (shared buffers move more), or at least a
percentage of the relation in OS cache. Getting autovacuum daemon
being able to update those maps/counters might be enought and easy to
do, it is really near what auto-analyze do. My observation is that
the percentage in cache is stable on a production workload after some
tens of minutes needed to warm the server.

What should really help here is to have hooks in the cost functions to
test those ideas without the need to patch postgresql-core a lot. Will
it be ok to have hooks or will it add to much CPU consumption in a
sensible part of

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Rod Taylor

 4. Even if we could accurately estimate the percentage of the table
 that is cached, what then?  For example, suppose that a user issues a
 query which retrieves 1% of a table, and we know that 1% of that table
 is cached.  How much of the data that the user asked for is cache?
 Hard to say, right?  It could be none of it or all of it.  The second
 scenario is easy to imagine - just suppose the query's been executed
 twice.  The first scenario isn't hard to imagine either.


I have a set of slow disks which can impact performance nearly as much as in
cached in memory versus the fast disks.

How practical would it be for analyze to keep a record of response times for
given sections of a table as it randomly accesses them and generate some
kind of a map for expected response times for the pieces of data it is
analysing?

It may well discover, on it's own, that recent data (1 month old or less)
has a random read response time of N, older data (1 year old) in a different
section of the relation tends to have a response time of 1000N, and really
old data (5 year old) tends to have a response time of 3000N.

Re: [HACKERS] wCTE behaviour

2011-02-25 Thread Tom Lane

Marko Tiikkaja marko.tiikk...@cs.helsinki.fi writes:
 On 2011-02-25 6:12 PM, Tom Lane wrote:
 The current implementation with everything in one plantree
 really ought to look just like a SELECT so far as the portal code
 is concerned.

 The problem was that the old code was using PORTAL_MULTI_QUERY whenever 
 a wCTE was present.  Are you saying that you are using 
 PORTAL_ONE_SELECT?  Doesn't that have problems with triggers, for example?

Hmmm ... good question.  I notice the lack of any regression test cases
involving triggers.  Will check this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Greg Stark

On Fri, Feb 25, 2011 at 2:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, the real reason for doing it isn't any of those, but rather
 to establish the principle that the executions of the modifying
 sub-queries are interleaved not sequential.  We're never going to be
 able to do any significant optimization of such queries if we have to
 preserve the behavior that the sub-queries execute sequentially.
 And I think it's inevitable that users will manage to build such an
 assumption into their queries if the first release with the feature
 behaves that way.

Does the interleaved execution have sane semantics?

With a query like:

WITH
  a as update x set x.i=x.i+1 returning x.i,
  b as update x set x.i=x.i+1 returning x.i
select * from a natural join b;

Is there any way to tell what it will return or what state it will
leave the table in?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Merlin Moncure

On Fri, Feb 25, 2011 at 11:31 AM, Greg Stark gsst...@mit.edu wrote:
 On Fri, Feb 25, 2011 at 2:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, the real reason for doing it isn't any of those, but rather
 to establish the principle that the executions of the modifying
 sub-queries are interleaved not sequential.  We're never going to be
 able to do any significant optimization of such queries if we have to
 preserve the behavior that the sub-queries execute sequentially.
 And I think it's inevitable that users will manage to build such an
 assumption into their queries if the first release with the feature
 behaves that way.

 Does the interleaved execution have sane semantics?

 With a query like:

 WITH
  a as update x set x.i=x.i+1 returning x.i,
  b as update x set x.i=x.i+1 returning x.i
 select * from a natural join b;

 Is there any way to tell what it will return or what state it will
 leave the table in?

WITH
  a as update x set x.i=x.i+1 returning x.i,
  b as update x set x.i=x.i+1 where x.i = 1 returning x.i
 select * from a natural join b;

or the above if x is.i is 1 for all x on query start?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas

2011/2/25 Cédric Villemain cedric.villemain.deb...@gmail.com:
 All that having been said, I think that while Josh is thinking fuzzily
 about the mathematics of his proposal, the basic idea is pretty
 sensible.  It is not easy - likely not possible - for the system to
 have a good idea which things will be in some kind of cache at the
 time the query is executed; it could even change mid-query.  The
 execution of one part of the query could evict from the cache data
 which some other part of the plan assumed would be cached.  But DBAs
 frequently have a very good idea of which stuff is in cache - they can
 make observations over a period of time and then adjust settings and
 then observe some more and adjust some more.

 I believe we can maintain a small map of area of a relation  which are
 in the OS buffer cache (shared buffers move more), or at least a
 percentage of the relation in OS cache. Getting autovacuum daemon
 being able to update those maps/counters might be enought and easy to
 do, it is really near what auto-analyze do.  My observation is that
 the percentage in cache is stable on a production workload after some
 tens of minutes needed to warm the server.

I don't think we can assume that will be true in all workloads.
Imagine a server doing batch processing.  People submit large batches
of work that take, say, an hour to complete.  Not all batches use the
same set of tables - maybe they even run in different databases.
After a big batch process finishes crunching numbers in database A,
very little of database B will be cached.  But it's not necessarily
right to assume that when we start queries for a new batch in database
B, although it's more likely to be right for large tables (which will
take a long time to get cached meaningfully, if they ever do) than
small ones.  Also, it could lead to strange issues where batches run
much faster or slower depending on which batch immediately proceeded
them.  If we're going to do something a lot of times, it'd be better
to bite the bullet and read it all in rather than going to more work
elsewhere, but if we're only going to touch it once, then not so much.

You might also have this issue on systems that run OLTP workloads all
day and then do some batch processing at night to get ready for the
next business day.  Kevin Grittner wrote previously about those jobs
needing some different settings in his environment (I'm not
remembering which settings at the moment).  Suppose that the batch
process is going to issue a query that can be planned in one of two
possible ways.  One way involves reading 10% of a relation, and the
other way involves reading the whole thing.  The first plan takes 200
s to execute if the relation is not cached, and 180 s if the relevant
portion is cached.  The second plan takes 300 s to execute if the
relation is not cached, and 100 s if it is cached.  At the start of
the batch run, the relation won't be cached, because it's used *only*
by the overnight job and not by the daily OLTP traffic.  Which way
should we execute the query?

The answer is that if the batch job only needs to execute that query
*once*, we should do it the first way.  But if it needs to execute it
three or more times, the second way is better, but only if we use the
second plan every time.  If we start out with the first plan, we're
always better off sticking with it *unless* we know that we're going
to repeat the query at least twice more after the iteration we're
currently planning.  To make the right decision, the query planner
needs a crystal ball.  Or, a little help from the DBA.

 What should really help here is to have hooks in the cost functions to
 test those ideas without the need to patch postgresql-core a lot. Will
 it be ok to have hooks or will it add to much CPU consumption in a
 sensible part of the code ?

Depends on where you put them, I guess.  Hooks are pretty cheap, but
they're also pretty hard to use.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Tom Lane

Greg Stark gsst...@mit.edu writes:
 On Fri, Feb 25, 2011 at 2:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, the real reason for doing it isn't any of those, but rather
 to establish the principle that the executions of the modifying
 sub-queries are interleaved not sequential.

 Does the interleaved execution have sane semantics?

Depends on what you call sane.  With the decision to not increment
command counter, it's already the case that people shouldn't have
two subqueries try to modify the same row.

 With a query like:

 WITH
   a as update x set x.i=x.i+1 returning x.i,
   b as update x set x.i=x.i+1 returning x.i
 select * from a natural join b;

 Is there any way to tell what it will return or what state it will
 leave the table in?

My reaction to that is you shouldn't do that, and you definitely
shouldn't complain if it's not predictable whether a or b will
modify a given row.  This is exactly the sort of assumption I don't
want people building into their queries, because we will be locked
into purely sequential execution if we promise that the results will
be consistent.

There is already precedent for that position.  You can easily construct
queries using UPDATE ... FROM wherein the same target row joins to more
than one row in the FROM table, and then it's unpredictable which
joining row will be used to update that target row.  Our position has
always been don't do that, not that we'd lobotomize the planner and
executor to ensure predictability.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Daniel Farina

On Fri, Feb 25, 2011 at 5:25 AM, marcin mank marcin.m...@gmail.com wrote:
 On Fri, Feb 25, 2011 at 9:14 AM, Daniel Farina dan...@heroku.com wrote:

 Right now, as it stands, the syncrep patch will be happy as soon as
 the data has been fsynced to either B or A-prime; I don't think we can
 guarantee at any point that A-prime can become the leader, and feed B.


 - start A` up, replicating from A
 - shutdown B (now A nad A` are synchronous)
 now real quick:
 - shut down A
 - shut down A`
 -change configuration
 -start up A`
 -start up B

 Doesn`t this work?

This dance does work, but it would be very nice to not have to take
the standby ('B' in my case) offline.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Alvaro Herrera

Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011:

 How practical would it be for analyze to keep a record of response times for
 given sections of a table as it randomly accesses them and generate some
 kind of a map for expected response times for the pieces of data it is
 analysing?

I think what you want is random_page_cost that can be tailored per
tablespace.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Daniel Farina

On Fri, Feb 25, 2011 at 4:43 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Feb 25, 2011 at 3:14 AM, Daniel Farina dan...@heroku.com wrote:
 On Wed, Feb 23, 2011 at 11:49 AM, Greg Smith g...@2ndquadrant.com wrote:
 Robert Haas wrote:

 2. Synchronous replication.  Splitting up this patch has allowed some
 On top of 4 listed reviewers I know Dan Farina is poking at the last update,
 so we may see one more larger report on top of what's already shown up.  And
 Jaime keeps kicking the tires too.  What Simon was hoping is that a week of
 others looking at this would produce enough feedback that it might be
 possible to sweep the remaining issues up soon after he's back.  It looks to
 me like that's about when everything else that's still open will probably
 settle too.

 Besides some of the fixable issues, I am going to have to echo
 Robert's sentiments about a few kinks that go beyond mechanism in the
 syncrep patch: in particular, it will *almost* solve the use case I
 was hoping to solve: a way to cleanly perform planned switchovers
 between machines with minimal downtime and no lost data. But there are
 a couple of holes I have thought of so far:

 Well, just because the patch doesn't solve every use case isn't a
 reason not to go forward with it - we can always add more options
 later - but I have to admit that I'm kind of alarmed about the number
 of bugs reported so far.

True: the relevance of any use case to acceptance is up to some
debate. I haven't thought about how to remedy this, just thinking
aloud about a problem I would have as-is, and is important to me.  It
is true that later accretion of options can occur, but sometimes the
initial choice of semantics can make growing those easier or harder.
I haven't yet thought ahead as to how the current scheme would impact
that.

I know I got hit by a backend synchronization (in the sense of locks,
etc) bugs; do you think it is possible yours (sending SIGSTOP) could
be the same root cause? I haven't followed all the other bugs cleared
up by inspection.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Jaime Casanova

On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havinga yebhavi...@gmail.com wrote:

 I also did some initial testing on this patch and got the queue related
 errors with  1 clients. With the code change from Jaime above I still got a
 lot of 'not on queue warnings'.

 I tried to understand how the queue was supposed to work - resulting in the
 changes below that also incorporates a suggestion from Fujii upthread, to
 early exit when myproc was found.


yes, looking at the code, the warning and your patch... it seems yours
is the right solution...
I'm compiling right now to test again and see the effects, Robert
maybe you can test your failure case again? i'm really sure it's
related to this...

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] help: collation support on Windows

2011-02-25 Thread Peter Eisentraut

According to the online documentation, the APIs are there:
http://msdn.microsoft.com/en-ca/library/a7cwbx4t.aspx

Now we'd need someone brave try to make it work.  The starting point
would be to define HAVE_LOCALE_T and then make it build.  Microsoft has
all the relevant functions and types with an underscore in front
(_strcoll_l, etc.), so some extra #defining will probably be necessary.

Also, initdb will need to be patched to get a list of OS locales to
populate the pg_collation catalog with.

Finally, a regression test customized for Windows, but I can help with
that later.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] help: collation support on Windows

2011-02-25 Thread Andrew Dunstan




On 02/25/2011 02:32 PM, Peter Eisentraut wrote:

According to the online documentation, the APIs are there:
http://msdn.microsoft.com/en-ca/library/a7cwbx4t.aspx

Now we'd need someone brave try to make it work.  The starting point
would be to define HAVE_LOCALE_T and then make it build.  Microsoft has
all the relevant functions and types with an underscore in front
(_strcoll_l, etc.), so some extra #defining will probably be necessary.

Also, initdb will need to be patched to get a list of OS locales to
populate the pg_collation catalog with.

Finally, a regression test customized for Windows, but I can help with
that later.






What is the equivalent of locale -a?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] help: collation support on Windows

2011-02-25 Thread Dave Page

On Fri, Feb 25, 2011 at 9:54 PM, Andrew Dunstan and...@dunslane.net wrote:


 On 02/25/2011 02:32 PM, Peter Eisentraut wrote:

 According to the online documentation, the APIs are there:
 http://msdn.microsoft.com/en-ca/library/a7cwbx4t.aspx

 Now we'd need someone brave try to make it work.  The starting point
 would be to define HAVE_LOCALE_T and then make it build.  Microsoft has
 all the relevant functions and types with an underscore in front
 (_strcoll_l, etc.), so some extra #defining will probably be necessary.

 Also, initdb will need to be patched to get a list of OS locales to
 populate the pg_collation catalog with.

 Finally, a regression test customized for Windows, but I can help with
 that later.





 What is the equivalent of locale -a?

There isn't a command that I know of, but the API function you
probably need is EnumSystemLocales.


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] help: collation support on Windows

2011-02-25 Thread Magnus Hagander

On Fri, Feb 25, 2011 at 22:58, Dave Page dp...@pgadmin.org wrote:
 On Fri, Feb 25, 2011 at 9:54 PM, Andrew Dunstan and...@dunslane.net wrote:


 On 02/25/2011 02:32 PM, Peter Eisentraut wrote:

 According to the online documentation, the APIs are there:
 http://msdn.microsoft.com/en-ca/library/a7cwbx4t.aspx

 Now we'd need someone brave try to make it work.  The starting point
 would be to define HAVE_LOCALE_T and then make it build.  Microsoft has
 all the relevant functions and types with an underscore in front
 (_strcoll_l, etc.), so some extra #defining will probably be necessary.

 Also, initdb will need to be patched to get a list of OS locales to
 populate the pg_collation catalog with.

 Finally, a regression test customized for Windows, but I can help with
 that later.





 What is the equivalent of locale -a?

 There isn't a command that I know of, but the API function you
 probably need is EnumSystemLocales.

Yeah. If you want example code, you can probably lift something from
the old pginstaller project on pgfoundry - we use that code to
populate the dropdown box for initdb there.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect's infomask and infomask2 as smallint

2011-02-25 Thread Alvaro Herrera

Excerpts from Tom Lane's message of mar feb 15 12:42:00 -0300 2011:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:

  pageinspect is just a debugging aid, so I think we should change it from 
  smallint to int4 in 9.1, and not bother backporting.
 
 I don't see any reason that the old version of the function couldn't be
 dropped in the upgrade script.  It's not likely anything would be
 depending on it, is it?

Okay, so I changed the C code, and there's working upgrade support that
takes you from the 9.0 version to the 9.1 version.

I tested this by creating a 9.0 database with pageinspect loaded, then
pg_upgrade'd it to 9.1, then ran
create extension pageinspect from unpackaged;

If I run the function before running the CREATE EXTENSION command, it
works but still displays the negative numbers.  After that, it behaves
as expected.

I was a bit surprised that I had to remove the ALTER EXTENSION/ADD
FUNCTION command from the upgrade script, but in hindsight it makes
perfect sense -- those commands are being run inside the
create_extension context and so the function being created already
belongs to the extension.  Still, maybe we should make ALTER
EXTENSION/ADD idempotent.

I considered the idea of calling this version 1.1 and shipping a new
pageinspect--1.0--1.1.sql script (which works perfectly, provided you
run ALTER EXTENSION/DROP FUNCTION before dropping the function, then
ALTER/ADD later), but decided that this was overkill.  We can still
change it if people thinks that'd be better, of course, but ...

Nice work on extensions and their upgradability, overall.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Robert Haas

On Fri, Feb 25, 2011 at 2:33 PM, Daniel Farina dan...@heroku.com wrote:
 I know I got hit by a backend synchronization (in the sense of locks,
 etc) bugs; do you think it is possible yours (sending SIGSTOP) could
 be the same root cause? I haven't followed all the other bugs cleared
 up by inspection.

I believe that the queue management logic is just totally busted and
needs to be rewritten.  I doubt there is much point in speculating
about details until that's done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Cédric Villemain

2011/2/25 Robert Haas robertmh...@gmail.com:
 2011/2/25 Cédric Villemain cedric.villemain.deb...@gmail.com:
 All that having been said, I think that while Josh is thinking fuzzily
 about the mathematics of his proposal, the basic idea is pretty
 sensible.  It is not easy - likely not possible - for the system to
 have a good idea which things will be in some kind of cache at the
 time the query is executed; it could even change mid-query.  The
 execution of one part of the query could evict from the cache data
 which some other part of the plan assumed would be cached.  But DBAs
 frequently have a very good idea of which stuff is in cache - they can
 make observations over a period of time and then adjust settings and
 then observe some more and adjust some more.

 I believe we can maintain a small map of area of a relation  which are
 in the OS buffer cache (shared buffers move more), or at least a
 percentage of the relation in OS cache. Getting autovacuum daemon
 being able to update those maps/counters might be enought and easy to
 do, it is really near what auto-analyze do.  My observation is that
 the percentage in cache is stable on a production workload after some
 tens of minutes needed to warm the server.

 I don't think we can assume that will be true in all workloads.
 Imagine a server doing batch processing.  People submit large batches
 of work that take, say, an hour to complete.  Not all batches use the
 same set of tables - maybe they even run in different databases.
 After a big batch process finishes crunching numbers in database A,
 very little of database B will be cached.  But it's not necessarily
 right to assume that when we start queries for a new batch in database
 B, although it's more likely to be right for large tables (which will
 take a long time to get cached meaningfully, if they ever do) than
 small ones.  Also, it could lead to strange issues where batches run
 much faster or slower depending on which batch immediately proceeded
 them.  If we're going to do something a lot of times, it'd be better
 to bite the bullet and read it all in rather than going to more work
 elsewhere, but if we're only going to touch it once, then not so much.

 You might also have this issue on systems that run OLTP workloads all
 day and then do some batch processing at night to get ready for the
 next business day.  Kevin Grittner wrote previously about those jobs
 needing some different settings in his environment (I'm not
 remembering which settings at the moment).  Suppose that the batch
 process is going to issue a query that can be planned in one of two
 possible ways.  One way involves reading 10% of a relation, and the
 other way involves reading the whole thing.  The first plan takes 200
 s to execute if the relation is not cached, and 180 s if the relevant
 portion is cached.  The second plan takes 300 s to execute if the
 relation is not cached, and 100 s if it is cached.  At the start of
 the batch run, the relation won't be cached, because it's used *only*
 by the overnight job and not by the daily OLTP traffic.  Which way
 should we execute the query?

 The answer is that if the batch job only needs to execute that query
 *once*, we should do it the first way.  But if it needs to execute it
 three or more times, the second way is better, but only if we use the
 second plan every time.  If we start out with the first plan, we're
 always better off sticking with it *unless* we know that we're going
 to repeat the query at least twice more after the iteration we're
 currently planning.  To make the right decision, the query planner
 needs a crystal ball.  Or, a little help from the DBA.

Yes, we are talking of improving some part of the model.
Some workloads are dramatic  and need special customization. This is true.

Still there is a path of improvement, and probably it will remain a
path of improvement after the current model is updated.

I am not proposing something to solve all the issues, but way more
interesting IMHO than just letting the dba say : 'this table is in
cache at XX%'.

Btw, pgfincore already do solve the usecase you provide by helping the
DBA to prepare its batch processing, so in some sense I am familiar
with what you describe (take the second plan, pgfincore will preload
in the background, and your query will be done in 100s from the
first).


 What should really help here is to have hooks in the cost functions to
 test those ideas without the need to patch postgresql-core a lot. Will
 it be ok to have hooks or will it add to much CPU consumption in a
 sensible part of the code ?

 Depends on where you put them, I guess.  Hooks are pretty cheap, but
 they're also pretty hard to use.

Yes, it will be easier to make an extension, have people testing it
and validate or not the 'new' model

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Josh Berkus


 4. Even if we could accurately estimate the percentage of the table
 that is cached, what then?  For example, suppose that a user issues a
 query which retrieves 1% of a table, and we know that 1% of that table
 is cached.  How much of the data that the user asked for is cache?

FWIW, for a manual override setting, I was thinking that the % would
convert to a probability.  In that way, it wouldn't be different from
the existing RPC calculation; we're just estimating how *likely* it is
that the data the user wants is cached.

 One idea Tom and I kicked around previously is to set an assumed
 caching percentage for each table based on its size relative to
 effective_cache_size - in other words, assume that the smaller a table
 is, the more of it will be cached.  Consider a system with 8GB of RAM,
 and a table which is 64kB.  It is probably unwise to make any plan
 based on the assumption that that table is less than fully cached.  If
 it isn't before the query executes, it soon will be.  Going to any
 amount of work elsewhere in the plan to avoid the work of reading that
 table in from disk is probably a dumb idea.  Of course, one downside
 of this approach is that it doesn't know which tables are hot and
 which tables are cold, but it would probably still be an improvement
 over the status quo.

Actually, we *do* have some idea which tables are hot.  Or at least, we
could.   Currently, pg_stats for tables are timeless; they just
accumulate from the last reset, which has always been a problem in
general for monitoring.  If we could make top-level table and index
stats time-based, even in some crude way, we would know which tables
were currently hot.  That would also have the benefit of making server
performance analysis and autotuning easier.

 But DBAs
 frequently have a very good idea of which stuff is in cache - they can
 make observations over a period of time and then adjust settings and
 then observe some more and adjust some more.

Agreed.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Josh Berkus


 Right now, as it stands, the syncrep patch will be happy as soon as
 the data has been fsynced to either B or A-prime; I don't think we can
 guarantee at any point that A-prime can become the leader, and feed B.

Yeah, I think that's something we said months ago is going to be a 9.2
feature, no sooner.

 2. The unprivileged user can disable syncrep, in any situation. This
 flexibility is *great*, but you don't really want people to do it when
 one is performing the switchover. Rather, in a magical world we'd hope
 that disabling syncrep would just result in not having to
 synchronously commit to B (but, in this case, still synchronously
 commit to A-prime)
 
 In other words, to my mind, you can use syncrep as-is to provide
 2-safe durability xor a scheduled switchover: as soon as someone wants
 both, I think they'll have some trouble. I do want both, though.

Hmmm, I don't follow this.  The user can only disable syncrep for their
own transactions.   If they don't care about the persistence of their
transaction post-failover, why should the DBA care?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread Tom Lane

Marko Tiikkaja marko.tiikk...@cs.helsinki.fi writes:
 On 2011-02-24 6:37 PM +0200, Tom Lane wrote:
 OK, I will make those adjustments.  Are you going to do more work on the
 documentation part of the patch?  I can stick to working on the code
 part meanwhile, if you are.

 I am planning on working on the documentation this weekend.

I've gone ahead and applied the code portion of the patch, with
modifications as per discussion, and other editorialization.
I'll wait on you to produce documentation updates before dealing
with the docs, but I figured we might as well get some buildfarm
cycles on it meanwhile.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Daniel Farina

On Fri, Feb 25, 2011 at 3:44 PM, Josh Berkus j...@agliodbs.com wrote:

 Right now, as it stands, the syncrep patch will be happy as soon as
 the data has been fsynced to either B or A-prime; I don't think we can
 guarantee at any point that A-prime can become the leader, and feed B.

 Yeah, I think that's something we said months ago is going to be a 9.2
 feature, no sooner.

Ah, okay, I had missed that discussion, I also did not know it got so
specific as to address this case (are you sure?) rather than something
more general, say quorum or N-safe durability.

 2. The unprivileged user can disable syncrep, in any situation. This
 flexibility is *great*, but you don't really want people to do it when
 one is performing the switchover. Rather, in a magical world we'd hope
 that disabling syncrep would just result in not having to
 synchronously commit to B (but, in this case, still synchronously
 commit to A-prime)

 In other words, to my mind, you can use syncrep as-is to provide
 2-safe durability xor a scheduled switchover: as soon as someone wants
 both, I think they'll have some trouble. I do want both, though.

 Hmmm, I don't follow this.  The user can only disable syncrep for their
 own transactions.   If they don't care about the persistence of their
 transaction post-failover, why should the DBA care?

The user may have their own level of durability guarantee they want to
attain (that's why machine B is syncrepped in my example), but when
doing the switchover I think an override to enable a smooth handoff
(meaning: everything syncrepped) would be best.  What I want to avoid
is an ack from COMMIT from the primary (machine A), and then, post
switchover, the data isn't there on machine A-Prime (or B, provided
it was able to follow successfully at all, as in the current case it
might get ahead of A-prime in the WAL).

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Josh Berkus

Daniel,

 Ah, okay, I had missed that discussion, I also did not know it got so
 specific as to address this case (are you sure?) rather than something
 more general, say quorum or N-safe durability.

The way we address that case is through n-safe durability.

 The user may have their own level of durability guarantee they want to
 attain (that's why machine B is syncrepped in my example), but when
 doing the switchover I think an override to enable a smooth handoff
 (meaning: everything syncrepped) would be best.  What I want to avoid
 is an ack from COMMIT from the primary (machine A), and then, post
 switchover, the data isn't there on machine A-Prime (or B, provided
 it was able to follow successfully at all, as in the current case it
 might get ahead of A-prime in the WAL).

Yeah, when I think about your use case, I can understand why it's an
issue.  It would be nice to have a superuser setting (or similar) which
could override user preferances and make all transactions synchrep
temporarily.  I'm not sure that's going to be reasonable to do for 9.1
though.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Daniel Farina

On Fri, Feb 25, 2011 at 4:36 PM, Josh Berkus j...@agliodbs.com wrote:
 Daniel,

 Ah, okay, I had missed that discussion, I also did not know it got so
 specific as to address this case (are you sure?) rather than something
 more general, say quorum or N-safe durability.

 The way we address that case is through n-safe durability.

How is this exposed? The simple count the number of fsyncs()
approach is not quite good enough (one has no control to make sure one
or more nodes are definitely up-to-date) unless one wants to just make
it go to *all* syncrep standys for a while. That seems like overkill;
so I imagine something else is in the thoughts. I'll search the
archives...

 The user may have their own level of durability guarantee they want to
 attain (that's why machine B is syncrepped in my example), but when
 doing the switchover I think an override to enable a smooth handoff
 (meaning: everything syncrepped) would be best.  What I want to avoid
 is an ack from COMMIT from the primary (machine A), and then, post
 switchover, the data isn't there on machine A-Prime (or B, provided
 it was able to follow successfully at all, as in the current case it
 might get ahead of A-prime in the WAL).

 Yeah, when I think about your use case, I can understand why it's an
 issue.  It would be nice to have a superuser setting (or similar) which
 could override user preferances and make all transactions synchrep
 temporarily.  I'm not sure that's going to be reasonable to do for 9.1
 though.

Agreed; I'd be happy to take any syncrep functionality, although it
wouldn't compose well as-is, I wanted to raise this so that we didn't
make any configuration decisions that got in the way of making
composition possible later.  Again, I haven't thought ahead yet,
partially because I thought there may be some existing thoughts in
play to consider.

With that, I will try to give syncrep a more structured review Real
Soon, although the late date of this is leaving me queasy as to the
odds of git-commit.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Jeff Davis

On Fri, 2011-02-25 at 15:44 -0800, Josh Berkus wrote:
 Hmmm, I don't follow this.  The user can only disable syncrep for their
 own transactions.   If they don't care about the persistence of their
 transaction post-failover, why should the DBA care?

I think that's the difference between failover and switchover, right? At
least Slony makes such a distinction, as well.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Jeff Davis

On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote:
 Oh, yes, this reproduces past shutdowns/startups, and there's quite a
 few txids before I catch up. I'm also comfortable poking around with
 gdb (I have already recompiled with debugging symbols and
 optimizations off and was poking around, especially at
 MemoryContextStats(TopMemoryContext), but was not rewarded.

Where is all of that memory going during recovery? Recovery shouldn't
use much memory at all, as far as I can tell.

What's even allocating memory at all?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Josh Berkus

On 2/25/11 4:57 PM, Jeff Davis wrote:
 On Fri, 2011-02-25 at 15:44 -0800, Josh Berkus wrote:
 Hmmm, I don't follow this.  The user can only disable syncrep for their
 own transactions.   If they don't care about the persistence of their
 transaction post-failover, why should the DBA care?
 
 I think that's the difference between failover and switchover, right? At
 least Slony makes such a distinction, as well.

Yeah.  Actually, what would be even simpler and more to the point would
be a command that says flush all transactions from Server A to Server
B, then fail over.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Daniel Farina

On Fri, Feb 25, 2011 at 4:52 PM, Jeff Davis pg...@j-davis.com wrote:
 On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote:
 Oh, yes, this reproduces past shutdowns/startups, and there's quite a
 few txids before I catch up. I'm also comfortable poking around with
 gdb (I have already recompiled with debugging symbols and
 optimizations off and was poking around, especially at
 MemoryContextStats(TopMemoryContext), but was not rewarded.

 Where is all of that memory going during recovery? Recovery shouldn't
 use much memory at all, as far as I can tell.

 What's even allocating memory at all?

I noticed this is RSS fooling with me. As pages get touched in shared
memory, for some reason RSS was constantly getting increased, along
with SHR at the same time.

Still, the long recovery time was mystifying to me, considering the
lack of unclean shutdowns.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Daniel Farina

On Fri, Feb 25, 2011 at 5:21 PM, Josh Berkus j...@agliodbs.com wrote:
 On 2/25/11 4:57 PM, Jeff Davis wrote:
 On Fri, 2011-02-25 at 15:44 -0800, Josh Berkus wrote:
 Hmmm, I don't follow this.  The user can only disable syncrep for their
 own transactions.   If they don't care about the persistence of their
 transaction post-failover, why should the DBA care?

 I think that's the difference between failover and switchover, right? At
 least Slony makes such a distinction, as well.

 Yeah.  Actually, what would be even simpler and more to the point would
 be a command that says flush all transactions from Server A to Server
 B, then fail over.

That would be nice; I'm basically abusing syncrep to this purpose. At
the same time, someone may need to be notified of such a switchover
occurring, and in event of failure, it'd be nice to bounce back to the
primary. Tangentially relevent, Virtual IP is not always an option,
such as on Amazon EC2.

But I digress. Such a command is unlikely to make it into 9.1; maybe
we can circle around on that in 9.2.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread Marko Tiikkaja


On 2011-02-26 2:00 AM, Tom Lane wrote:

I've gone ahead and applied the code portion of the patch, with
modifications as per discussion, and other editorialization.


Thanks a lot!

One thing bothers me though: what was the reason for requiring a 
RETURNING clause for data-modifying statements in WITH?



I'll wait on you to produce documentation updates before dealing
with the docs, but I figured we might as well get some buildfarm
cycles on it meanwhile.


Thanks, I'll send an improved version tomorrow.


Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread Tom Lane

Marko Tiikkaja marko.tiikk...@cs.helsinki.fi writes:
 On 2011-02-26 2:00 AM, Tom Lane wrote:
 I've gone ahead and applied the code portion of the patch, with
 modifications as per discussion, and other editorialization.

 Thanks a lot!

 One thing bothers me though: what was the reason for requiring a 
 RETURNING clause for data-modifying statements in WITH?

That test was in your patch, no?  I moved the code to another place
but it's still enforcing the same thing, namely that you can't reference
the output of an INSERT/UPDATE/DELETE that hasn't got RETURNING.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread David Fetter

On Fri, Feb 25, 2011 at 09:58:36AM -0500, Tom Lane wrote:
 I had what seems to me a remarkably good idea, though maybe someone else
 can spot a problem with it.  Given that we've decided to run the
 modifying sub-queries all with the same command counter ID, they are
 logically executing in parallel.  The current implementation takes no
 advantage of that fact, though: it's based around the idea of running
 the updates strictly sequentially.  I think we should change it so that
 the updates happen physically, not only logically, concurrently.
 Specifically, I'm imagining getting rid of the patch's additions to
 InitPlan and ExecutePlan that find all the modifying sub-queries and
 force them to be cycled to completion before the main plan runs.
 Just run the main plan and let it pull tuples from the CTEs as needed.
 Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to
 completion before shutting down the plan.  (In the event of an error,
 we'd never get to ExecutorEnd, but it doesn't matter since whatever
 updates we did apply are nullified anyhow.)

What's the effect, if any, on CTEs that depend on each other
explicitly?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread David Fetter

On Thu, Feb 24, 2011 at 11:20:48AM -0500, Tom Lane wrote:
 The wCTE patch refers to the feature it's adding as DML WITH.  I'm
 still pretty unhappy with that terminology.  In my view of the world,
 DML includes SELECT as well as INSERT/UPDATE/DELETE.  The wikipedia
 entry about the term
 http://en.wikipedia.org/wiki/Data_Manipulation_Language
 agrees that that's at least the majority usage, and even our own docs
 seem to use it to include SELECT as often as not.  Since the distinction
 is absolutely critical to talking about this feature sensibly, I don't
 think it's a good plan to use an acronym that is guaranteed to produce
 uncertainty in the reader's mind.
 
 The best idea I have at the moment is to spell out data modifying
 command (or statement) rather than relying on the acronym.
 In the code, we could change hasDmlWith to hasModifyingWith, for
 example.  The error messages could read like
   data-modifying statement in WITH is not allowed in a view
 
 Comments?

+1

If we ever decide add in what I'd originally envisioned, namely DCL
and DDL, the name continues to describe what's going on :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread David Fetter

On Thu, Feb 24, 2011 at 11:35:30AM -0800, David Wheeler wrote:
 On Feb 24, 2011, at 10:43 AM, Robert Haas wrote:
 
  The best idea I have at the moment is to spell out data modifying
  command (or statement) rather than relying on the acronym.
  In the code, we could change hasDmlWith to hasModifyingWith, for
  example.  The error messages could read like
 data-modifying statement in WITH is not allowed in a view
  
  Comments?
  
  Great idea.  I had the same complaint when I looked at this patch
  a year ago, but didn't come up with nearly as good an idea as to
  what to do about it.
 
 I like statement better than command, too, but love the acronym
 DMC. As in, you want to Run [a] DMC. ;-P

Hit it, Run! ;)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Tom Lane

David Fetter da...@fetter.org writes:
 What's the effect, if any, on CTEs that depend on each other
 explicitly?

An error.  That would require mutual recursion, which we don't
support for the SELECT case let alone data-modifying statements.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread David Fetter

On Fri, Feb 25, 2011 at 10:12:02PM -0500, Tom Lane wrote:
 David Fetter da...@fetter.org writes:
  What's the effect, if any, on CTEs that depend on each other
  explicitly?
 
 An error.  That would require mutual recursion, which we don't
 support for the SELECT case let alone data-modifying statements.

Sorry that was unclear.  Let's imagine there's a DELETE ... RETURNING
in one WITH, and an UPDATE in another that depends on that one.  Is
that still allowed?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Tom Lane

David Fetter da...@fetter.org writes:
 Sorry that was unclear.  Let's imagine there's a DELETE ... RETURNING
 in one WITH, and an UPDATE in another that depends on that one.  Is
 that still allowed?

Yeah it is, although I just noticed that there's a bug in the new
implementation:

with t1 as (insert into x select ... returning *),
 t2 as (insert into y select * from t1 returning *)
select 1;

This should result in the same rows inserted into both x and y, but in
git HEAD it fails to insert anything into y.  The reason is that the
ExecutorEnd scan first processes the ModifyTable node for x, and cycles
it to completion, discarding the results --- but we needed the CteScan
in t2 to see those rows.  There's a related case in the regression
tests, but it works because the outer query does fetch from both WITH
clauses, so there's no need to do anything at ExecutorEnd time.

The first solution that comes to mind is to pay attention to the
interdependencies of the CTEs, and perform the cleanup in an appropriate
order (here, the ModifyTable for y needs to be cycled first).  I'm not
sure if there's a nicer way.  We'll eventually want some interdependency
tracking for CTEs anyway, if we're ever to support mutual recursion,
so it'd not be completely single-purpose code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Tom Lane

I wrote:
 The first solution that comes to mind is to pay attention to the
 interdependencies of the CTEs, and perform the cleanup in an appropriate
 order (here, the ModifyTable for y needs to be cycled first).

Doh ... actually, we already *are* ordering the CTEs in dependency
order, so it's a one-liner fix to do the shutdowns in reverse order.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL FDW update

2011-02-25 Thread Robert Haas

On Thu, Feb 24, 2011 at 9:06 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Feb 24, 2011 at 8:13 AM, Shigeru HANADA
 han...@metrosystems.co.jp wrote:

 On Wed, 23 Feb 2011 20:30:05 +0900
 Shigeru HANADA han...@metrosystems.co.jp wrote:


 On Tue, 22 Feb 2011 11:33:25 -0500
 Robert Haas robertmh...@gmail.com wrote:
  Is anyone actually working on a new version of this patch sufficiently
  rapidly that we can expect a new version in the next day or two?
 
  If not, I think we mark this one Returned with Feedback and revisit it 
  for 9.2.

 I'm working on it.

 Fixes for new FDW API have been done, but there are some problems in
 SQL generation codes, such as SELECT clause optimization (omitting
 unused column from SELECT clause).  It would take a while, but I'll
 post revised version of the patch tomorrow.

 Attached is a revised version of postgresql_fdw patch.  I started from
 Heikki's latest patch, and modified some points:

 1) use new FDW API
 2) use EXTENSION framework
 3) SELECT clause optimization (use NULL for unused columns)
 4) show remote query in EXPLAIN output

 WHERE clause pushdown was implemented in Heikki's version, so I didn't
 touch around it.  Now I'm working on cost estimation and connection
 management, but they would need some more work.

 So this is still work-in-progress?  When do you expect a final version?

Since it sounds like this still needs more work, I'm going to mark it
Returned with Feedback.  I hope we integrate this during the 9.2
cycle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Fix snapshot taking inconsistencies

2011-02-25 Thread Robert Haas

On Thu, Feb 24, 2011 at 11:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Tiikkaja marko.tiikk...@cs.helsinki.fi writes:
 On 2011-02-24 5:21 PM, Tom Lane wrote:
 Oh, did we decide to do it that way?  OK with me, but the submitted docs
 are woefully inadequate on the point.  This behavior is going to have to
 be explained extremely clearly (and even so, I bet we'll get bug reports
 about it :-().

 I'm ready to put more effort into the documentation if the patch is
 going in, but I really don't want to waste my time just to hear that the
 patch is not going to be in 9.1.  Does this sound acceptable?

 I've found some things I don't like about it, but the only part that
 seems far short of being committable is the documentation.

Tom/Alvaro, have the two of you hammered out who is going to finish
this one off?  I *believe* Alvaro told me on IM that he was leaving
this one for Tom.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python tracebacks

2011-02-25 Thread Robert Haas

On Thu, Feb 24, 2011 at 9:03 AM, Jan Urbański wulc...@wulczer.org wrote:
 On 24/02/11 14:10, Peter Eisentraut wrote:
 On tor, 2010-12-23 at 14:56 +0100, Jan Urbański wrote:
 For errors originating from Python exceptions add the traceback as the
 message detail. The patch tries to mimick Python's traceback.py module
 behaviour as close as possible, icluding interleaving stack frames
 with source code lines in the detail message. Any Python developer
 should instantly recognize these kind of error reporting, it looks
 almost the same as an error in the interactive Python shell.

 I think the traceback should go into the CONTEXT part of the error.  The
 context message that's already there is now redundant with the
 traceback.

 You could even call errcontext() multiple times to build up the
 traceback, but maybe that's not necessary.

 Hm, perhaps, I put it in the details, because it sounded like the place
 to put information that is not that important, but still helpful. It's
 kind of natural to think of the traceback as the detail of the error
 message. But if you prefer context, I'm fine with that. You want me to
 update the patch to put the traceback in the context?

I don't see a response to this question from Peter, but I read his
email to indicate that he was hoping you'd rework along these lines.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python explicit subtransactions

2011-02-25 Thread Robert Haas

On Wed, Feb 9, 2011 at 5:22 PM, Peter Eisentraut pete...@gmx.net wrote:
 On tis, 2011-02-08 at 00:32 -0500, Steve Singer wrote:
 On 11-02-06 11:40 AM, Jan Urbański wrote:

  PFA an updated patch with documentation.

  Yeah, changed them.

 Those changes look fine.  The tests now pass.

 I've attached a new version of the patch that fixes a few typos/wording
 issues I saw in the documentation.  I also changed the link to the
 python reference manual section on context managers. I think it is
 better to link to that versus the original PEP.

 The documentation could probably still use more word-smithing but that
 can happen later.  I'm marking this as ready for a committer.

 Is it necessarily a good idea that an explicit subtransaction disables
 the implicit sub-subtransactions?  It might be conceivable that you'd
 still want to do some try/catch within explicit subtransactions.

Is this still an open question, or what is the remaining issue that
needs to be addressed with regards to this patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Fix snapshot taking inconsistencies

2011-02-25 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 Tom/Alvaro, have the two of you hammered out who is going to finish
 this one off?  I *believe* Alvaro told me on IM that he was leaving
 this one for Tom.

Last I heard, the ball was in my court.  I'll try to get it done over
the weekend.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python do not delete function arguments

2011-02-25 Thread Robert Haas

On Tue, Feb 15, 2011 at 6:04 PM, Jan Urbański wulc...@wulczer.org wrote:
 On 15/02/11 20:39, Peter Eisentraut wrote:
 On tis, 2011-02-15 at 09:58 +0100, Jan Urbański wrote:
 Because the invocation that actually recurses sets up the scene for
 failure.

 That's what we're observing, but I can't figure out why it is.  If you
 can, could you explain it?

 It actually makes sense to me that the arguments should be deleted at
 the end of the call.  The data belongs to that call only, and
 PLy_procedure_delete() that would otherwise clean it up is only called
 rarely.

 Apparently, the recursive call ends up deleting the wrong arguments, but
 it's not clear to me why that would affect the next top-level call,
 because that would set up its own arguments again anyway.  In any case,
 perhaps the right fix is to fix PLy_function_delete_args() to delete the
 args correctly.

 Aaah, ok, I got it (again). Let me write this in full before I forget
 and spend another hour chasing that bug (and boy, bugs that disappear
 because you're doing things in the debugger are so annoying). And
 actually, my patch doesn't fix it fully :| Let me demonstrate:

 CREATE FUNCTION rec(n integer) RETURNS integer AS $$
 if n == 0:
    return
 plpy.notice(before n is %d % n)
 plpy.execute(select rec(0))
 plpy.notice(after n is %d % n)
 $$ LANGUAGE plpythonu;

 Without the patch the second plpy.notice raises a NameError. With the
 patch the output is:

 NOTICE:  before n is 4
 CONTEXT:  PL/Python function rec
 NOTICE:  after n is 0
 CONTEXT:  PL/Python function rec

 What happens? In PLy_function_handler, PLy_function_build_args is
 called, and proc-globals is set. After that PLy_procedure_call is
 called, which starts executing Python code. The Python code does a call
 into C with plpy.execute, and PLy_function_handler gets called (a
 reentrant call).

 Then PLy_function_build_args is called again. It overwrites the n
 entry in proc-globals and then PLy_procedure_call gets called, which
 drops us back into Python (on the stack there's now C, Python, C,
 Python). This second invocation exits quickly because n == 0, and we're
 back in C.

 Now without my patch, the next thing to happen was deleting the
 arguments, which removed n from the proc-globals dict. The rest of C
 code runs and finally plpy.execute returns and we;re back in Python (the
 stack is C, Python).

 The second plpy.notice is run, which fetches n from the globals, and
 not finding it, raises a NameError. With the patch it simply fetches the
 overwritten value, namely 0.

 The KeyError was a red herring - that's how Python reacted when
 evaluating n in (0, 1), and if you look in the server log you'll see a
 RuntimeWarning complaining about something internal, that doesn't
 matter. The bottom line is that PLy_procedure_call is not reentrant
 because of proc-globals, and it has to be.

 Now when fixing this bug I tries copying the globals dict and restoring
 it, but ran into issues (I think the problem was that the function
 didn't like running with different globals then the one it has been
 compiled with). Not sure what to do with this :( Document it as a caveat
 (with or without my patch) and carry on? That sucks quite badly...

From this discussion I gather that we have a problem here that we
don't exactly know how to fix, so I'm inclined to suggest that we mark
this Returned with Feedback in the CommitFest and instead add it to
the TODO.  Since this is a pre-existing bug and not a new regression,
it should not be something we hold up beta for.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread Robert Haas

On Fri, Feb 25, 2011 at 7:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Tiikkaja marko.tiikk...@cs.helsinki.fi writes:
 On 2011-02-24 6:37 PM +0200, Tom Lane wrote:
 OK, I will make those adjustments.  Are you going to do more work on the
 documentation part of the patch?  I can stick to working on the code
 part meanwhile, if you are.

 I am planning on working on the documentation this weekend.

 I've gone ahead and applied the code portion of the patch,

Yay!  I'm excited about this, particularly the possible pipelining
stuff, where you can do WITH (DELETE .. RETURNING ..) INSERT ...  and
have it be like cool and fast and stuff.

Or at least I hope you can do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas

On Fri, Feb 25, 2011 at 2:26 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011:

 How practical would it be for analyze to keep a record of response times for
 given sections of a table as it randomly accesses them and generate some
 kind of a map for expected response times for the pieces of data it is
 analysing?

 I think what you want is random_page_cost that can be tailored per
 tablespace.

We have that.

But it's not the same as tracking *sections of a table*.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas

On Fri, Feb 25, 2011 at 6:41 PM, Josh Berkus j...@agliodbs.com wrote:
 One idea Tom and I kicked around previously is to set an assumed
 caching percentage for each table based on its size relative to
 effective_cache_size - in other words, assume that the smaller a table
 is, the more of it will be cached.  Consider a system with 8GB of RAM,
 and a table which is 64kB.  It is probably unwise to make any plan
 based on the assumption that that table is less than fully cached.  If
 it isn't before the query executes, it soon will be.  Going to any
 amount of work elsewhere in the plan to avoid the work of reading that
 table in from disk is probably a dumb idea.  Of course, one downside
 of this approach is that it doesn't know which tables are hot and
 which tables are cold, but it would probably still be an improvement
 over the status quo.

 Actually, we *do* have some idea which tables are hot.  Or at least, we
 could.   Currently, pg_stats for tables are timeless; they just
 accumulate from the last reset, which has always been a problem in
 general for monitoring.  If we could make top-level table and index
 stats time-based, even in some crude way, we would know which tables
 were currently hot.  That would also have the benefit of making server
 performance analysis and autotuning easier.

I think there would be value in giving the DBA an easier way to see
which tables are hot, but I am really leery about the idea of trying
to feed that directly into the query planner.  I think this is one of
those cases where we let people tune it manually for starters, and
then wait for feedback.  Eventually someone will say oh, I never tune
that by hand any more, ever since I wrote this script which does the
following computation... and I just run it out cron.  And then we
will get out the party hats.  But we will never get the experience we
need to say what that auto-tuning algorithm will be unless we first
provide the knob for someone to fiddle with manually.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 Yay!  I'm excited about this, particularly the possible pipelining
 stuff, where you can do WITH (DELETE .. RETURNING ..) INSERT ...  and
 have it be like cool and fast and stuff.

 Or at least I hope you can do that.

It's gonna need some work yet.  As things stand, the tuples are indeed
pipelined through, but the CteScan nodes *also* stash them aside into
tuplestores, just in case somebody demands a rescan.  Fixing that will
require revisiting the exec flags (EXEC_FLAG_REWIND etc).  We don't
currently distinguish it's unlikely you'll have to rescan from
you're guaranteed not to have to rescan, but a CteScan that's covering
a ModifyTable has to know the latter to not have to keep hold of copies
of the RETURNING tuples.

It might be a small enough change to do after alpha starts, but I don't
have time for it right now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread Robert Haas

On Sat, Feb 26, 2011 at 12:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Yay!  I'm excited about this, particularly the possible pipelining
 stuff, where you can do WITH (DELETE .. RETURNING ..) INSERT ...  and
 have it be like cool and fast and stuff.

 Or at least I hope you can do that.

 It's gonna need some work yet.  As things stand, the tuples are indeed
 pipelined through, but the CteScan nodes *also* stash them aside into
 tuplestores, just in case somebody demands a rescan.  Fixing that will
 require revisiting the exec flags (EXEC_FLAG_REWIND etc).  We don't
 currently distinguish it's unlikely you'll have to rescan from
 you're guaranteed not to have to rescan, but a CteScan that's covering
 a ModifyTable has to know the latter to not have to keep hold of copies
 of the RETURNING tuples.

 It might be a small enough change to do after alpha starts, but I don't
 have time for it right now.

Well, if nothing else, the potential is there for a future release.
I'm probably not quite as excited about this feature as David Fetter
(and my 100-Watt lightbulb is not quite as bright as the sun at high
noon in midsummer) but I do think it's pretty cool, and I appreciate
you getting it in, even in a somewhat basic form.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Tom Lane

Marko Tiikkaja marko.tiikk...@cs.helsinki.fi writes:
 On 2011-02-25 4:58 PM, Tom Lane wrote:
 Specifically, I'm imagining getting rid of the patch's additions to
 InitPlan and ExecutePlan that find all the modifying sub-queries and
 force them to be cycled to completion before the main plan runs.
 Just run the main plan and let it pull tuples from the CTEs as needed.
 Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to
 completion before shutting down the plan.

 This idea has actually been discussed before when we talked about 
 optimizing wCTEs, but IIRC you said that doing this in ExecutorEnd is a 
 bit ugly.

Further experimentation has reminded me of why I didn't want to put such
processing in ExecutorEnd :-(.  There are some nasty interactions with
EXPLAIN:

1. EXPLAIN ANALYZE fails to include the execution cycles associated with
running the ModifyTable nodes to completion.  In the worst case, such as
WITH t AS (INSERT ...) SELECT 1, it will claim the INSERT subplan is
never executed, even though rows certainly got inserted.  This is
because EXPLAIN extracts all the counts from the execution state tree
before shutting it down with ExecutorEnd.

2. But it gets worse.  Try the same query *without* ANALYZE.  You'll
find the INSERT executes anyway!  That's because EXPLAIN still calls
ExecutorEnd to clean up the execution state tree, and ExecutorEnd
doesn't realize it's not supposed to run any of the plan.

So we really need some refactoring here.  I dislike adding another
fundamental step to the ExecutorStart/ExecutorRun/ExecutorEnd sequence,
but there may not be a better way.  The only way I see to fix this
without changing that API is to have ExecutorRun do the cleanup
processing just after the top plan node returns a null tuple, and that
seems a bit ugly as well.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO: You can alter it, but you can't view it

2011-02-25 Thread Bruce Momjian

Josh Berkus wrote:
 
  Right now pg_options_to_table() is not documented.  Should it be?
 
 Yes, I think so.

Done, with the attached, applied patch.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 736eb67..c620142 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SELECT pg_type_is_visible('myschema.widg
*** 13244,13249 
--- 13244,13253 
 /indexterm
  
 indexterm
+ primarypg_options_to_table/primary
+/indexterm
+ 
+indexterm
  primarypg_tablespace_databases/primary
 /indexterm
  
*** SELECT pg_type_is_visible('myschema.widg
*** 13380,13385 
--- 13384,13394 
 entryget underlying commandSELECT/command command for view/entry
/row
row
+entryliteralfunctionpg_options_to_table(parameterreloptions/parameter)/function/literal/entry
+entrytypename, option/type/entry
+entryget the set of option name/value pairs from structnamepg_class/.structfieldreloptions//entry
+   /row
+   row
 entryliteralfunctionpg_tablespace_databases(parametertablespace_oid/parameter)/function/literal/entry
 entrytypesetof oid/type/entry
 entryget the set of database OIDs that have objects in the tablespace/entry
*** SELECT pg_type_is_visible('myschema.widg
*** 13475,13480 
--- 13484,13495 
/para
  
para
+functionpg_options_to_table/function returns the set of option
+name/value pairs when passed
+structnamepg_class/.structfieldreloptions/.
+   /para
+ 
+   para
 functionpg_tablespace_databases/function allows a tablespace to be
 examined. It returns the set of OIDs of databases that have objects stored
 in the tablespace. If this function returns any rows, the tablespace is not

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Bruce Momjian

Robert Haas wrote:
  Actually, we *do* have some idea which tables are hot. ?Or at least, we
  could. ? Currently, pg_stats for tables are timeless; they just
  accumulate from the last reset, which has always been a problem in
  general for monitoring. ?If we could make top-level table and index
  stats time-based, even in some crude way, we would know which tables
  were currently hot. ?That would also have the benefit of making server
  performance analysis and autotuning easier.
 
 I think there would be value in giving the DBA an easier way to see
 which tables are hot, but I am really leery about the idea of trying
 to feed that directly into the query planner.  I think this is one of
 those cases where we let people tune it manually for starters, and
 then wait for feedback.  Eventually someone will say oh, I never tune
 that by hand any more, ever since I wrote this script which does the
 following computation... and I just run it out cron.  And then we
 will get out the party hats.  But we will never get the experience we
 need to say what that auto-tuning algorithm will be unless we first
 provide the knob for someone to fiddle with manually.

It is also possible we will implement a manual way and never get around
to automating it.   :-(

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Keywords in pg_hba.conf should be field-specific

2011-02-25 Thread Bruce Momjian


Any progress on this?

---

Brendan Jurd wrote:
 On 18 October 2010 01:19, Tom Lane t...@sss.pgh.pa.us wrote:
  Brendan Jurd dire...@gmail.com writes:
  On 17 October 2010 09:59, Tom Lane t...@sss.pgh.pa.us wrote:
  Good point. ?Maybe the correct fix is to remember whether each token was
  quoted or not, so that keyword detection can be done safely after the
  initial lexing. ?I still think that the current method is impossibly
  ugly ...
 
  I'm happy to revise the patch on that basis. ?Any suggestions about
  how to communicate the 'quotedness' of each token? ?We could make each
  token a struct consisting of the token itself, plus a boolean flag to
  indicate whether it had been quoted. ?Does that work for you?
 
  Seems reasonable. ?I had the idea of a parallel list of booleans in the
  back of my mind, but a list of structs is probably easier to understand,
  and to extend further if necessary.
 
 
 Okay, I've taken the red pill and I'm finding out how deep the rabbit
 hole goes ...
 
 The logical structure of pg_hba.conf is a set of lines, each line
 containing a set of fields, each field containing a set of tokens.
 The way the existing implementation handles this is to create a list
 of lines containing sublists of fields, containing comma-separated
 strings for the set of tokens, with newlines embedded next to tokens
 which might be keywords.
 
 The tokeniser breaks apart the comma-separated tokens ... and then
 reassembles them into a comma-separated string.  Which the db/role
 matching functions then have to break apart *again*.
 
 In order to keep track of whether each individual token was quoted, I
 first need to impose some sanity here.  Rather than using a magical
 string for each field, I intend to use a List of HbaToken structs
 which explicitly note whether quoting was used.
 
 Introducing an extra List level does mean a bit more work copying and
 freeing, and it makes the patch really quite intrusive.  I have to
 touch a lot of lines in hba.c, but I think the additional clarity is
 worth it.  If nobody dissuades me from this approach I hope to post a
 patch in a couple of days.
 
 Cheers,
 BJ
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

66 matches

Mail list logo