Re: [HACKERS] Query progress indication - an implementation

2009-07-02 Thread Robert Haas
On Thu, Jul 2, 2009 at 12:48 PM, Euler Taveira de
Oliveira wrote:
> I know that it didn't solve the estimation problem but ... IMHO the
> [under|over]estimation should be treated by an external tool (autoexplain?).
> So when we enable the query progress and some node reports a difference
> between estimated and real more than x%, log the plan. Doing it, we will be
> helping DBAs to investigate the bad plans.

Keep in mind that it is frequently the case that the estimates are
substantially off but the plan still works OK.  I just put a dirty
hack into one of my apps to improve the selectivity estimates by a
factor of 200, but they're still off by a factor of 5.  Even when they
were off by 1000x the bad plan happened only intermittently.  You
notice the cases where the estimates are off and it makes for a bad
plan, but there are lots of other cases where the estimates are off
but the plan is still OK.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-07-02 Thread Greg Stark
On Thu, Jul 2, 2009 at 2:32 AM, Bruce Momjian wrote:
> I think the only resonable solution would be to consider the estimated
> cost of each node and then compute what percentage complete each node
> is.
>

Well you can do better for some nodes. A sequential scan for example
can tell you exactly what percentage of the way through its scan it
is. A sort node that's fnished the sort can produce an value based on
both the estimate of the relative costs of the sort vs reading the
results and the actual percentage progress reading the results.

So I think it has to come down to another ExecProcNode method the way
I had it arranged in my patch that actually implemented this.

I was partly waiting for the other patch which multiplexed signals
onto fewer actual unix signals to go through. And for XML explain
plans to go through. Once we have those then I think my patch is
actually nearly there, it just needs some additional tweaking of the
heuristics for more plan types.

Then comes the fun part of figuring out a useful UI for psql and
pgadmin. Personally I'm happy for psql to just print the plan whenever
the user hits siginfo. I think an apt-style curses progress bar would
be unecessarily heavyweight for the lightweight vision I have for
psql. But I know others have more ambitious visions for psql.



-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-07-01 Thread Bruce Momjian
Tom Lane wrote:
> Joshua Tolley  writes:
> > On Mon, Jun 29, 2009 at 02:07:23PM -0400, Tom Lane wrote:
> >> I think this is pretty much nonsense --- most queries run all their plan
> >> nodes concurrently to some extent.  You can't usefully say that a query
> >> is "on" some node, nor measure progress by whether some node is "done".
> 
> > What about showing the outermost node where work has started?
> 
> That's always the outermost node; what would it tell you?

[ Repost ]

I think the only resonable solution would be to consider the estimated
cost of each node and then compute what percentage complete each node
is.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-30 Thread Simon Riggs

On Mon, 2009-06-29 at 18:49 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On Mon, 2009-06-29 at 14:07 -0400, Tom Lane wrote:
> >> I think this is pretty much nonsense --- most queries run all their plan
> >> nodes concurrently to some extent.  You can't usefully say that a query
> >> is "on" some node, nor measure progress by whether some node is "done".
> 
> > The requirement is not nonsense, even if the detail was slightly off.
> 
> I was applying the word "nonsense" to the proposed implementation,
> not the desire to have query progress indications ...

Understood, just trying to limit the blast radius.

> > We can regard plans as acting in phases with each blocking node
> > separating the plan. We know which nodes those are, so we can report
> > that.
> 
> [ shrug... ] You can regard them that way, but you won't get
> particularly helpful results for a large fraction of real queries.
> The system is generally set up to prefer "streaming" evaluation
> as much as it can.  Even in nominally blocking nodes like Sort and Hash,
> there are operational modes that look more like streaming, or at least
> chunking.

It's not always useful, though many large queries do have multiple
phases. The concept and the name come from ETL tools and it is of real
practical use in those environments. We can put the phase number on the
EXPLAIN easily, and it is very simple to calculate the total number of
phases and the current phase - e.g. 2 of 5 phases complete.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-30 Thread Simon Riggs

On Tue, 2009-06-30 at 07:04 +0200, Dimitri Fontaine wrote:
> Le 30 juin 2009 à 01:34, Greg Stark  a écrit :
> > Basically I disagree that imperfect progress reports annoy users. I
> > think we can do better than reporting 250% done or having a percentage
> > that goes backward though. It would be quite tolerable (though perhaps
> > for no logical reason) to have a progress indicator which slows done
> > as it gets closer to 100% and never seems to make it to 100%.
> 
> I guess bad stats are such an important problem in planning queries  
> that a 250% progress is doing more good than harm in showing users how  
> badly they need to review their analyze related settings.

Yeh, I agree. We can define it as "planned work", rather than actual. So
if the progress bar says 250% and query is still going at least you know
it is doing more work, rather than just being slow at doing the planned
work.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Scara Maccai

> +1.  Especially if I run it a few times and I can see
> which counters
> are still moving.

Per-node percentage is easy to do (given the perfect estimates, of course).
The problem comes when you want to give an "overall" percentage.

I wouldn't know where to put that "explain-like" output though: in a column in 
pg_stat_get_activity??? (and it would be available only if the proper variable 
was "on" before sending the query)

> -1.    A counter that slowly goes from 99% to
> 99.5% done is
> much worse than a counter that takes the same much time
> going from "1000% of estimated rows done" to "2000% of
> estimated rows done".

It's not just about estimates.
Even with 100% correct estimates, IMHO there's no way to get the perfect amount 
of work done so far.
And this is even without considering multiple queries running at the same 
time...

If someone has some time to read those papers let me know what he thinks about 
them... because I think their methods couldn't give them those results...






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Dimitri Fontaine

Le 30 juin 2009 à 01:34, Greg Stark  a écrit :

Basically I disagree that imperfect progress reports annoy users. I
think we can do better than reporting 250% done or having a percentage
that goes backward though. It would be quite tolerable (though perhaps
for no logical reason) to have a progress indicator which slows done
as it gets closer to 100% and never seems to make it to 100%.


I guess bad stats are such an important problem in planning queries  
that a 250% progress is doing more good than harm in showing users how  
badly they need to review their analyze related settings.


Regards,
--
dim
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Robert Haas
On Mon, Jun 29, 2009 at 8:15 PM, Ron Mayer wrote:
> Greg Stark wrote:
>>
>> Right, that was why my proposed interface was to dump out the explain
>> plan with the number of loops, row counts seen so far, and approximate
>> percentage progress.
>>
>> My thinking was that a human could interpret that to understand where
>> the bottleneck is if, say you're still on the first row for the top
>> few nodes but all the nodes below a certain sort have run to
>> completion that the query is busy running the sort...
>
> +1.  Especially if I run it a few times and I can see which counters
> are still moving.

+1 from me, too, as I said upthread.

>> Basically I disagree that imperfect progress reports annoy users. I
>> think we can do better than reporting 250% done or having a percentage
>> that goes backward though. It would be quite tolerable (though perhaps
>> for no logical reason) to have a progress indicator which slows done
>> as it gets closer to 100% and never seems to make it to 100%.
>
> -1.    A counter that slowly goes from 99% to 99.5% done is
> much worse than a counter that takes the same much time
> going from "1000% of estimated rows done" to "2000% of
> estimated rows done".
>
> The former just tells me that it lies about how much is done.
> The latter tells me that it's processing each row quickly but
> that the estimate was way off.

I think both of those options are a little wacky.  Maybe 800% **of
estimated rows done** is not so bad, since the tag line provides some
context, but what does it mean exactly?  Rows for the toplevel plan
node?  That doesn't seem like it would always be too useful.  I keep
coming back to thinking you need to see the whole tree.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Ron Mayer

Greg Stark wrote:

Right, that was why my proposed interface was to dump out the explain
plan with the number of loops, row counts seen so far, and approximate
percentage progress.

My thinking was that a human could interpret that to understand where
the bottleneck is if, say you're still on the first row for the top
few nodes but all the nodes below a certain sort have run to
completion that the query is busy running the sort...


+1.  Especially if I run it a few times and I can see which counters
are still moving.


Basically I disagree that imperfect progress reports annoy users. I
think we can do better than reporting 250% done or having a percentage
that goes backward though. It would be quite tolerable (though perhaps
for no logical reason) to have a progress indicator which slows done
as it gets closer to 100% and never seems to make it to 100%.


-1.A counter that slowly goes from 99% to 99.5% done is
much worse than a counter that takes the same much time
going from "1000% of estimated rows done" to "2000% of
estimated rows done".

The former just tells me that it lies about how much is done.
The latter tells me that it's processing each row quickly but
that the estimate was way off.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Greg Stark
>> On Mon, 2009-06-29 at 14:07 -0400, Tom Lane wrote:
>>> I think this is pretty much nonsense --- most queries run all their plan
>>> nodes concurrently to some extent.  You can't usefully say that a query
>>> is "on" some node, nor measure progress by whether some node is "done".

Right, that was why my proposed interface was to dump out the explain
plan with the number of loops, row counts seen so far, and approximate
percentage progress.

My thinking was that a human could interpret that to understand where
the bottleneck is if, say you're still on the first row for the top
few nodes but all the nodes below a certain sort have run to
completion that the query is busy running the sort...

But a tool like psql or pgadmin would receive that and just display
the top-level percent progress. pgadmin might actually be able to
display its graphical explain with some graphical representation of
the percent progress of each node.

We can actually do *very* well for percent progress for a lot of
nodes. Sequential scans or bitmap scans, for example, can display
their actual percent done in terms of disk blocks.

The gotcha I ran into was what to do with a nested loop join. The safe
thing to do would be to report just the outer child's percentage
directly. But that would perform poorly in the not uncommon case where
there's one expected outer tuple. If we could trust the outer estimate
we could report (outer-percentage + (1/outer-estimate *
inner-percentage)) but that will get weird quickly if the
outer-percentage turns out to be underestimated.

Basically I disagree that imperfect progress reports annoy users. I
think we can do better than reporting 250% done or having a percentage
that goes backward though. It would be quite tolerable (though perhaps
for no logical reason) to have a progress indicator which slows done
as it gets closer to 100% and never seems to make it to 100%.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Tom Lane
Simon Riggs  writes:
> On Mon, 2009-06-29 at 14:07 -0400, Tom Lane wrote:
>> I think this is pretty much nonsense --- most queries run all their plan
>> nodes concurrently to some extent.  You can't usefully say that a query
>> is "on" some node, nor measure progress by whether some node is "done".

> The requirement is not nonsense, even if the detail was slightly off.

I was applying the word "nonsense" to the proposed implementation,
not the desire to have query progress indications ...

> We can regard plans as acting in phases with each blocking node
> separating the plan. We know which nodes those are, so we can report
> that.

[ shrug... ] You can regard them that way, but you won't get
particularly helpful results for a large fraction of real queries.
The system is generally set up to prefer "streaming" evaluation
as much as it can.  Even in nominally blocking nodes like Sort and Hash,
there are operational modes that look more like streaming, or at least
chunking.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Simon Riggs

On Mon, 2009-06-29 at 14:07 -0400, Tom Lane wrote:
> Josh Berkus  writes:
> > So, while an actual % completed indicator would be perfect, a "query 
> > steps completed, current step =" would still be very useful and a large 
> > improvement over what we have now.
> 
> I think this is pretty much nonsense --- most queries run all their plan
> nodes concurrently to some extent.  You can't usefully say that a query
> is "on" some node, nor measure progress by whether some node is "done".

The requirement is not nonsense, even if the detail was slightly off.

We can regard plans as acting in phases with each blocking node
separating the plan. We know which nodes those are, so we can report
that.

For each phase, it may be very hard to say what percentage is truly
complete, but we could at least report how much work has been done and
provide a percentage against planned numbers.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Peter Hunsberger
On Mon, Jun 29, 2009 at 1:07 PM, Tom Lane wrote:
> Josh Berkus  writes:
>> So, while an actual % completed indicator would be perfect, a "query
>> steps completed, current step =" would still be very useful and a large
>> improvement over what we have now.
>
> I think this is pretty much nonsense --- most queries run all their plan
> nodes concurrently to some extent.  You can't usefully say that a query
> is "on" some node, nor measure progress by whether some node is "done".

What you get in Toad for Oracle is the ability to see long running
processes with multiple lines, one per process currently underway. If
I recall correctly, the returned information includes what operation
is underway (eg, physical reads), the % complete, start time, time
remaining and elapsed time.  Time remaining has been mostly useless
every time I've had to drill down to this level, but otherwise this
has been relatively useful information.

-- 
Peter Hunsberger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Tom Lane
Joshua Tolley  writes:
> On Mon, Jun 29, 2009 at 02:07:23PM -0400, Tom Lane wrote:
>> I think this is pretty much nonsense --- most queries run all their plan
>> nodes concurrently to some extent.  You can't usefully say that a query
>> is "on" some node, nor measure progress by whether some node is "done".

> What about showing the outermost node where work has started?

That's always the outermost node; what would it tell you?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Joshua Tolley
On Mon, Jun 29, 2009 at 02:07:23PM -0400, Tom Lane wrote:
> Josh Berkus  writes:
> > So, while an actual % completed indicator would be perfect, a "query 
> > steps completed, current step =" would still be very useful and a large 
> > improvement over what we have now.
> 
> I think this is pretty much nonsense --- most queries run all their plan
> nodes concurrently to some extent.  You can't usefully say that a query
> is "on" some node, nor measure progress by whether some node is "done".

What about showing the outermost node where work has started?

--
Josh / eggyknap
End Point Corp.
www.endpoint.com


signature.asc
Description: Digital signature


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Robert Haas
On Mon, Jun 29, 2009 at 11:15 AM,  wrote:
> So the code that I have right now works "pretty well" for the 10 queries of 
> my project, but I guess won't work for general queries :(

I think that's probably right.

>> So, I'm all in favor of what you're trying to conceptually;
>> I just
>> don't like your proposed implementation.
>
> What kind of implementation would you propose?

I don't really have a proposed implementation in mind; I think it's a
hard problem.  That's why I suggested looking at the
EXPLAIN-ANALYZE-in-progress stuff.  By providing a lot more detail, a
human being can take a look at that output and make a judgment about
what's going on.  That's not as easy-to-use as what you're trying to
do, but I suspect it's more useful in practice.  It might be that
after reading a few hundred of those someone could propose some rules
of thumb to estimate the percentage of completion, which we could then
incorporate back into the system.  If not, we lose nothing by
implementing that feature first, since it is independently useful.

> Thank you very much for your comments.

No problem.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Tom Lane
Josh Berkus  writes:
> So, while an actual % completed indicator would be perfect, a "query 
> steps completed, current step =" would still be very useful and a large 
> improvement over what we have now.

I think this is pretty much nonsense --- most queries run all their plan
nodes concurrently to some extent.  You can't usefully say that a query
is "on" some node, nor measure progress by whether some node is "done".

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Josh Berkus

All,

Actually, an indicator of even just what step of the query was being 
executed would be very useful for checking on stuck queries.  If a DBA 
checks once that the query is on "bitmapscan on table_x(index_y)", and 
it's still on that 15 minutes later, he/she can guess that the query is 
thrashing due to HW or bad plan issues and kill it.


If the query is on "sort rowset by col1" then the DBA knows not to kill 
it because it's almost done.


So, while an actual % completed indicator would be perfect, a "query 
steps completed, current step =" would still be very useful and a large 
improvement over what we have now.


--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Scara Maccai

> IMO
> any
> diagnostics you can provide for a low cost are
> useful.  The more
> detail, the better.  "Step 1 of 10" is good, "80%
> complete on step 1
> of 10" is better.  "80% complete on step 1, 10%
> complete on 10 steps"
> is even better.

Well, I guess  "Step 1 of 10" would be pretty trivial to implement (given the 
tree plan, how many branches have finished executing).
This doesn't tell you anything about the remaining time though.
That is: "Step 1 of 10" doesn't mean that you are at 10%, nor that you are at 
50%, or 99%. In fact, I'm afraid it doesn't tell you anything...
I don't understand how useful such an implementation would be... "being
assured that you actually are making forward progress"... in which cases could 
you not be making forward progress?

Still, this won't take much time: if it's needed by someone else I guess it 
could be easily done.






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread m_lists

> It's
> easy to have
> estimates that are off by a factor of two or three, though,
> so I think
> you'd frequently have situations when the query completed
> when the
> progress estimater was at 40% or 250%. 

I thought about implementing a "given perfect estimates" indicator at first 
then, as a second step, using histograms to leverage the indicator precision at 
run time. Of course, this doesn't mean the user wouldn't see the query 
completed at 40% or "slowing down" in a lot of cases...

I started this patch after reading the papers in 
http://wiki.postgresql.org/wiki/Query_progress_indication
Apparently they were able to predict query execution remaining time (in case of 
a "perfect estimates" query) with a very simple algorithm.

Given that:
1) The algorithm ("driver node hypothesis") is so easy 
2) My project fits in the category of "perfect estimates" queries

I thought "I'll give it a try".

Well: I have no idea how they got their results.

IMHO it's not possible to get max 10% error on query remaing time on most of 
the tpcd queries using that method, since the "driver nodes" have all the same 
"importance". I had to introduce a lot of complexity (not in the patch that I 
posted) to have it "somehow" working, giving the nodes different work per tuple 
according to the node type (example: in a loop join the time it takes to read a 
row of the outer relation can't be compared to, say, the time it takes to read 
a row from a table scan: but the driver node hypothesis says they will take the 
same time...).

So the code that I have right now works "pretty well" for the 10 queries of my 
project, but I guess won't work for general queries :(

> So, I'm all in favor of what you're trying to conceptually;
> I just
> don't like your proposed implementation.

What kind of implementation would you propose?

Thank you very much for your comments.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Robert Haas
On Mon, Jun 29, 2009 at 4:47 AM, Scara Maccai wrote:
>> As to the content of the patch, I think that what you are doing is
>> comparing the actual number of "operations" with the expected number
>> of operations.  If that's correct, I'm not sure it's really all that
>> useful, because it will only give you accurate
>> percentage-of-completion information when the estimates are
>> correct. But when the estimates are correct, you probably have a
>> pretty good idea how long the query will take to run anyway.
>
> That would be a first step. Having an idea of how much a query "progressed" 
> is very important for long-running queries. It's like
> copying files: even if you have a rough idea of how much time a copy will 
> take, having an interface that tells you the percentage
> done is quite useful (IMHO).

I am all in favor of a query progress indicator; the trick is
constructing one that will actually be useful.  It's easy to have
estimates that are off by a factor of two or three, though, so I think
you'd frequently have situations when the query completed when the
progress estimater was at 40% or 250%. Those kinds of progress
indicators tend to annoy users, and for good reason.  File copying is
the sort of thing where it's pretty easy to estimate percentage of
completion by files and bytes; query execution is not.

So, I'm all in favor of what you're trying to conceptually; I just
don't like your proposed implementation.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Peter Hunsberger
On Mon, Jun 29, 2009 at 3:47 AM, Scara Maccai wrote:
>
> Is anyone interested in such a progress indicator???
>

I'm relatively new to Postgres and just starting to look at starting
to look at what we might do with it for handling large  genomic
datasets. I've used Toad for Oracle to have a look at whats going on
inside long running Oracle queries.  Knowing that a particular step is
doing a particular activity is useful for diagnostics as well as being
assured that you actually are making forward progress. IMO any
diagnostics you can provide for a low cost are useful.  The more
detail, the better.  "Step 1 of 10" is good, "80% complete on step 1
of 10" is better.  "80% complete on step 1, 10% complete on 10 steps"
is even better.

-- 
Peter Hunsberger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-29 Thread Scara Maccai

> You might want to take a look at this:
> http://wiki.postgresql.org/wiki/Submitting_a_Patch

I will; I'm sorry it wasn't in the proper format. It was just a proof of 
concept, I guess I should have talked about it before even sending the patch.


> As to the content of the patch, I think that what you are
> doing is
> comparing the actual number of "operations" with the
> expected number
> of operations.  If that's correct, I'm not sure it's
> really all that
> useful, because it will only give you accurate
> percentage-of-completion information when the estimates are
> correct.
> But when the estimates are correct, you probably have a
> pretty good
> idea how long the query will take to run anyway.  

That would be a first step. Having an idea of how much a query "progressed" is 
very important for long-running queries. It's like copying files: even if you 
have a rough idea of how much time a copy will take, having an interface that 
tells you the percentage done is quite useful (IMHO).

> When
> the estimates
> are off, you'll find that the actual number of operations
> is more than
> the expected number of operations, but that won't really
> tell you how
> far you have to go.

The second phase would be using histograms to help refine the statistics at 
runtime.


> The only other use case I can think of for functionality of
> this type
> is some kind of dashboard view on a system with very
> long-running
> queries, where you want to see how far you have yet to go
> on each one
> (maybe to approximate when you can submit the next one)
> without having
> detailed knowledge of how expensive each individual query
> was project
> to be.  But that's a pretty narrow use case

I don't think it's that narrow: it is important, for long running queries, to 
know how far the query processed (the user wants to know how much of a query 
has yet to be executed). That's why you find so many papers on query progress 
indicators.
The real problem is that they don't give you a solution :)


> Greg Stark was (is?) working on a way to get
> EXPLAIN-ANALYZE type
> statistics on running queries; you might want to take a
> look at some
> of that work and see what you think.
> 
> http://wiki.postgresql.org/wiki/Greg_Stark%27s_Development_Projects#EXPLAIN_queries_in_progress
> 


That's interesting. I'll take a look!

Thank you very much for your comments.

Is anyone interested in such a progress indicator???






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-28 Thread Peter Eisentraut
On Sunday 28 June 2009 20:38:59 Robert Haas wrote:
> The only other use case I can think of for functionality of this type
> is some kind of dashboard view on a system with very long-running
> queries, where you want to see how far you have yet to go on each one
> (maybe to approximate when you can submit the next one) without having
> detailed knowledge of how expensive each individual query was project
> to be.  But that's a pretty narrow use case, and I'm not sure it
> really justifies the overhead of instrumenting every query in this
> way.

Well, progress information is probably only going to be useful for long-
running processes anyway, and then only when the admin is too bored and 
wonders what he can usefully cancel.  So it seems import to figure out exactly 
when to turn this on and when not to without causing too much overhead 
elsewhere.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query progress indication - an implementation

2009-06-28 Thread Robert Haas
On Fri, Jun 26, 2009 at 3:37 AM, Scara Maccai wrote:
> Hi all,
>
> following the link in
>
> http://wiki.postgresql.org/wiki/Query_progress_indication
>
> but mostly:
>
> http://www.postech.ac.kr/~swhwang/progress2.pdf [1]
>
> I'm trying to write an implementation of the "dne" method in postgresql.
>
> I added another column to the pg_stat_get_activity function to report the 
> percentage of work done for the query (of course, any other method could be 
> used... the way the percentage is reported to the user can be easily changed).
>
> I attached a first patch (just to see if anyone is interested, the work is by 
> no means finished).
>
> I guess I did a lot of mistakes, since I don't know anything about postgresql 
> code...
>
> 1) the progress indicator can be eliminated at runtime; this could be done 
> with another runtime flag (at the moment is always on)
>
> 2) I added a new structure (Progress) to PlanState to keep all the info about 
> execution progress
>
> 3) I needed a pointer to the root of the PlanStates, to be able to calculate 
> the total progress of the query tree (I bet this pointer was already 
> available somewhere, but I couldn't find where...)
>
> 4) sub-plans are not included yet (well, just to be honest, I don't really 
> know what postgresql means with those... :) )
>
> 5) the percentage is updated at most every second (can be easily changed)
>
> 6) the methods to adjust upper/lower bounds in [1] are not implemented yet 
> (but that shouldn't be a problem)
>
> 7) the "spilled tuples" handling in [1] is not supported yet
>
> 8) only hash join, loop join, aggregate, sequence scans are implemented at 
> the moment
>
> 9) I added another flag (EXEC_FLAG_DRIVER_BRANCH) in executor.h to signal to 
> the sub-nodes if they are part of a branch that will contain a driver node 
> (for example, inner subtree of a Nested Loops join is not a driver branch). I 
> guess this could be done better at Plan level (instead of PlanState), but 
> this way less code has to be changed
>
> 10) at the moment all driver nodes have the same "work_per_tuple=1", but this 
> could be changed (for example, CPU-intensive driver nodes could have a 
> smaller work_per_tuple value)
>
> Well, some (very early) tests on a tpcd db showed it works as expected (well, 
> I only did very few tests...)

You might want to take a look at this:
http://wiki.postgresql.org/wiki/Submitting_a_Patch

The project style is not to use C++-style comments, and you should
eliminate all of the unnecessary diff hunks from your patch (like
files that have only comment or whitespace changes).  Also, it is
requested that patches be submitted in context diff format and added
to the CommitFest wiki here:

http://wiki.postgresql.org/wiki/CommitFest_2009-First

As to the content of the patch, I think that what you are doing is
comparing the actual number of "operations" with the expected number
of operations.  If that's correct, I'm not sure it's really all that
useful, because it will only give you accurate
percentage-of-completion information when the estimates are correct.
But when the estimates are correct, you probably have a pretty good
idea how long the query will take to run anyway.  When the estimates
are off, you'll find that the actual number of operations is more than
the expected number of operations, but that won't really tell you how
far you have to go.

The only other use case I can think of for functionality of this type
is some kind of dashboard view on a system with very long-running
queries, where you want to see how far you have yet to go on each one
(maybe to approximate when you can submit the next one) without having
detailed knowledge of how expensive each individual query was project
to be.  But that's a pretty narrow use case, and I'm not sure it
really justifies the overhead of instrumenting every query in this
way.  For a fraction of the run-time cost, you could include the
estimated total cost of the query in the pg_stat_activity output,
which would let the user do much the same thing presuming that they
have some knowledge of the usual ratio between costs and execution
times.

Greg Stark was (is?) working on a way to get EXPLAIN-ANALYZE type
statistics on running queries; you might want to take a look at some
of that work and see what you think.

http://wiki.postgresql.org/wiki/Greg_Stark%27s_Development_Projects#EXPLAIN_queries_in_progress

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Query progress indication - an implementation

2009-06-26 Thread Scara Maccai
Hi all,

following the link in 

http://wiki.postgresql.org/wiki/Query_progress_indication

but mostly:

http://www.postech.ac.kr/~swhwang/progress2.pdf [1]

I'm trying to write an implementation of the "dne" method in postgresql.

I added another column to the pg_stat_get_activity function to report the 
percentage of work done for the query (of course, any other method could be 
used... the way the percentage is reported to the user can be easily changed).

I attached a first patch (just to see if anyone is interested, the work is by 
no means finished).

I guess I did a lot of mistakes, since I don't know anything about postgresql 
code...

1) the progress indicator can be eliminated at runtime; this could be done with 
another runtime flag (at the moment is always on)

2) I added a new structure (Progress) to PlanState to keep all the info about 
execution progress

3) I needed a pointer to the root of the PlanStates, to be able to calculate 
the total progress of the query tree (I bet this pointer was already available 
somewhere, but I couldn't find where...)

4) sub-plans are not included yet (well, just to be honest, I don't really know 
what postgresql means with those... :) )

5) the percentage is updated at most every second (can be easily changed)

6) the methods to adjust upper/lower bounds in [1] are not implemented yet (but 
that shouldn't be a problem)

7) the "spilled tuples" handling in [1] is not supported yet

8) only hash join, loop join, aggregate, sequence scans are implemented at the 
moment

9) I added another flag (EXEC_FLAG_DRIVER_BRANCH) in executor.h to signal to 
the sub-nodes if they are part of a branch that will contain a driver node (for 
example, inner subtree of a Nested Loops join is not a driver branch). I guess 
this could be done better at Plan level (instead of PlanState), but this way 
less code has to be changed

10) at the moment all driver nodes have the same "work_per_tuple=1", but this 
could be changed (for example, CPU-intensive driver nodes could have a smaller 
work_per_tuple value)

Well, some (very early) tests on a tpcd db showed it works as expected (well, I 
only did very few tests...)

Hope someone is interested


  Index: src/include/pgstat.h
===
RCS file: /projects/cvsroot/pgsql/src/include/pgstat.h,v
retrieving revision 1.83
diff -r1.83 pgstat.h
568a569,571
> 
> /* current percentage of progress */
> float   st_progress_perc;
646a650,651
> extern void pgstat_report_progress_percentage(double perc);
> 
Index: src/backend/executor/Makefile
===
RCS file: /projects/cvsroot/pgsql/src/backend/executor/Makefile,v
retrieving revision 1.29
diff -r1.29 Makefile
25c25
nodeWindowAgg.o tstoreReceiver.o spi.o progress.o
Index: src/backend/executor/execProcnode.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/executor/execProcnode.c,v
retrieving revision 1.65
diff -r1.65 execProcnode.c
109a110,111
> #include "executor/progress.h"
> #include "pgstat.h"
111a114
> void ProgressUpdate(PlanState* node, double* tot_operations_expected, double* 
> tot_operations_so_far);
132a136
>   boolis_driver_node_candidate = false;
175a180,181
>   is_driver_node_candidate = true;
> 
261a268
>   is_driver_node_candidate = true;
314a322,325
>   /* Set up progress info for this node if requested */
>   if (result->state->es_progress)
>   ProgressSetInfo(result, node, eflags, is_driver_node_candidate);
> 
328a340,343
>   struct timeval t;
>   double  tot_operations_expected = 0;
>   double  tot_operations_so_far = 0;
> 
462a478,493
>   // progress calcs (only if required)
>   if (node->state->es_progress && node->progress != NULL && 
> node->progress->is_driver_node)
>   {
>   node->progress->operations_so_far++;
>   gettimeofday(&t, NULL);
>   if (t.tv_sec > node->state->es_progress_last_update.tv_sec)
>   {
>   ProgressUpdate(node->state->es_root_planstate, 
> &tot_operations_expected, &tot_operations_so_far);
>   if (tot_operations_expected != 0)
>   {
>   
> pgstat_report_progress_percentage(tot_operations_so_far*100/tot_operations_expected);
>   node->state->es_progress_last_update = t;
>   }
>   }
>   }
> 
466a498,524
> void ProgressUpdate(PlanState* node, double* tot_operations_expected, double* 
> tot_operations_so_far)
> {
>   // TODO here a  switch (nodeTag(node)) is needed in case we want 
> upper/lower limit update
>   if (node->progress->is_driver_node)
>   {
>   *tot_operations_expected += node