Re: AW: AW: [HACKERS] LIMIT in DECLARE CURSOR: request for comments

2000-11-03 Thread Tom Lane

Zeugswetter Andreas SB [EMAIL PROTECTED] writes:
 Right.  So what do you think about a hint that takes the form of a SET
 variable for the fetch percentage to assume for a DECLARE CURSOR?

 Since we don't have other hints that are embedded directly into the SQL
 that sounds perfect. 

 The not so offhand question for me is whether to use this percentage
 for non cursor selects also. Imho both should (at least in default)
 behave the same.

Not at all, since in a non-cursor select you *must* retrieve all the
data.  I can't see any reason to optimize that on any other basis than
total execution time.

regards, tom lane



Re: AW: [HACKERS] LIMIT in DECLARE CURSOR: request for comments

2000-11-03 Thread Tom Lane

Zeugswetter Andreas SB [EMAIL PROTECTED] writes:
 I did understand this, but I still disagree. Whether this is what you want
 strongly depends on what the application does with the resulting rows.

Sure ...

 There is no way for the backend to know this, thus imho the app needs
 to give a hint. 

Right.  So what do you think about a hint that takes the form of a SET
variable for the fetch percentage to assume for a DECLARE CURSOR?

regards, tom lane



AW: [HACKERS] LIMIT in DECLARE CURSOR: request for comments

2000-11-03 Thread Zeugswetter Andreas SB


  I'd say that normally you're not using cursors because you intend to throw
  away 80% or 90% of the result set, but instead you're using it because
  it's convenient in your programming environment (e.g., ecpg).  There are
  other ways of getting only some rows, this is not it.
 
 I didn't say I was assuming that the user would only fetch 10% of the
 rows.  Since what we're really doing is a linear interpolation between
 startup and total cost, what this is essentially doing is favoring low
 startup cost, but not to the complete exclusion of total cost.
 I think that that describes the behavior we want for a cursor pretty well.

I did understand this, but I still disagree. Whether this is what you want
strongly depends on what the application does with the resulting rows.
It is the correct assumption if the application needs a lot of time 
to process each row. If the application processing for each row is fast,
we will still want least total cost. 

There is no way for the backend to know this, thus imho the app needs
to give a hint. 

Andreas



Re: [HACKERS] LIMIT in DECLARE CURSOR: request for comments

2000-11-02 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 Tom Lane writes:
 1. If DECLARE CURSOR does not contain a LIMIT, continue to plan on the
 basis of 10%-or-so fetch

 I'd say that normally you're not using cursors because you intend to throw
 away 80% or 90% of the result set, but instead you're using it because
 it's convenient in your programming environment (e.g., ecpg).  There are
 other ways of getting only some rows, this is not it.

I didn't say I was assuming that the user would only fetch 10% of the
rows.  Since what we're really doing is a linear interpolation between
startup and total cost, what this is essentially doing is favoring low
startup cost, but not to the complete exclusion of total cost.  I think
that that describes the behavior we want for a cursor pretty well.

It remains to argue about what the relative weighting ought to be
... which might be best answered by making it user-settable.

regards, tom lane



Re: [HACKERS] LIMIT in DECLARE CURSOR: request for comments

2000-10-31 Thread Peter Eisentraut

Tom Lane writes:

 1. If DECLARE CURSOR does not contain a LIMIT, continue to plan on the
 basis of 10%-or-so fetch

I'd say that normally you're not using cursors because you intend to throw
away 80% or 90% of the result set, but instead you're using it because
it's convenient in your programming environment (e.g., ecpg).  There are
other ways of getting only some rows, this is not it.

So I think if you want to make optimization decisions based on cursors
being used versus a "normal" select, then the only thing you can safely
take into account is the network roundtrip and client processing per
fetch, but that might be as random as anything.

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/




Re: [HACKERS] LIMIT in DECLARE CURSOR: request for comments

2000-10-31 Thread Philip Warner

At 10:51 31/10/00 +0100, Peter Eisentraut wrote:
Tom Lane writes:

 1. If DECLARE CURSOR does not contain a LIMIT, continue to plan on the
 basis of 10%-or-so fetch

I'd say that normally you're not using cursors because you intend to throw
away 80% or 90% of the result set, but instead you're using it because
it's convenient in your programming environment (e.g., ecpg).  There are
other ways of getting only some rows, this is not it.

Yes!


So I think if you want to make optimization decisions based on cursors
being used versus a "normal" select, then the only thing you can safely
take into account is the network roundtrip and client processing per
fetch, but that might be as random as anything.

Which is why I like the client being able to ask the optimizer for certain
kinds of solutions *explicitly*.



Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: AW: [HACKERS] LIMIT in DECLARE CURSOR: request for comments

2000-10-31 Thread Philip Warner

At 14:14 31/10/00 +0100, Zeugswetter Andreas SB wrote:
 
 Which is why I like the client being able to ask the 
 optimizer for certain kinds of solutions *explicitly*.

Yes, something like:
   set optimization to [first_rows|all_rows]


That's one way that is usefull for affecting all subsequent statements, but
it would be nice to also allow such things in each statement, eg. in comments:

/*++optimizer: fast_start, no_seq_scan */
select...

ie. make all settable values dynamically settable in a statement. The scope
of the settings would probably have to depend on the abilities of the
optimizer - eg. how would subselects and views be handled?



Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



AW: [HACKERS] LIMIT in DECLARE CURSOR: request for comments

2000-10-30 Thread Zeugswetter Andreas SB


 After thinking some more about yesterday's discussions, I propose that
 we adopt the following planning behavior for cursors:
 
 1. If DECLARE CURSOR does not contain a LIMIT, continue to plan on the
 basis of 10%-or-so fetch (I'd consider anywhere from 5% to 25% to be
 just as reasonable, if people want to argue about the exact number;
 perhaps a SET variable is in order?).  10% seems to be a reasonable
 compromise between delivering tuples promptly and not choosing a plan
 that will take forever if the user fetches the whole result.

Imho that was a wrong assumption in the first place. The default assumption 
imho needs to be 100 %. Especially if you fixed the limit clause enabling people
to optimize the few rows fetched case.

 3. If DECLARE CURSOR contains "LIMIT ALL", plan on the assumption that
 all tuples will be fetched, ie, select lowest-total-cost plan.
 
 (Note: LIMIT ALL has been in the grammar right along, but up to now
 it has been entirely equivalent to leaving out the LIMIT clause.  This
 proposal essentially suggests allowing it to act as a planner 
 hint that
 the user really does intend to fetch all the tuples.)
 
 Comments?

Imho an explicit statement to switch optimizer mode from all rows to first rows
would be a lot easier to understand and is what other DB vendors do.

Andreas



Re: AW: [HACKERS] LIMIT in DECLARE CURSOR: request for comments

2000-10-30 Thread Stephan Szabo


On Mon, 30 Oct 2000, Zeugswetter Andreas SB wrote:

 
  After thinking some more about yesterday's discussions, I propose that
  we adopt the following planning behavior for cursors:
  
  1. If DECLARE CURSOR does not contain a LIMIT, continue to plan on the
  basis of 10%-or-so fetch (I'd consider anywhere from 5% to 25% to be
  just as reasonable, if people want to argue about the exact number;
  perhaps a SET variable is in order?).  10% seems to be a reasonable
  compromise between delivering tuples promptly and not choosing a plan
  that will take forever if the user fetches the whole result.
 
 Imho that was a wrong assumption in the first place. The default assumption 
 imho needs to be 100 %. Especially if you fixed the limit clause enabling people
 to optimize the few rows fetched case.

But what if you're doing fetch 10 rows, fetch 10 rows, ...
You're not limiting, because you want all of them, but you are only
pulling a small number at a time to say do expensive front end processing.
It might make sense to actually pull a plan which is lower startup and
higher per row.  Although the full cost is higher, you get a better
turnaround time on the first set and the cost difference per set may
be unnoticeable (it would depend on the particulars).




Re: [HACKERS] LIMIT in DECLARE CURSOR: request for comments

2000-10-28 Thread Don Baccus

At 12:18 PM 10/27/00 -0400, Tom Lane wrote:

Hiroshi was a little concerned about this change in behavior, and
so the first order of business is whether anyone wants to defend the
old way?  IMHO it was incontrovertibly a bug, but ...

Sure feels like a bug to me.  Having it ignored isn't what I'd expect.



- Don Baccus, Portland OR [EMAIL PROTECTED]
  Nature photos, on-line guides, Pacific Northwest
  Rare Bird Alert Service and other goodies at
  http://donb.photo.net.



[HACKERS] LIMIT in DECLARE CURSOR: request for comments

2000-10-27 Thread Tom Lane

Hiroshi and I had a discussion last night that needs to reach a wider
audience than just the bystanders on pgsql-committers.  Let me see if
I can reconstruct the main points.

In 7.0, a LIMIT clause can appear in a DECLARE CURSOR, but it's ignored:

play= select * from vv1;
 f1
-
   0
  123456
 -123456
  2147483647
 -2147483647
   0
(6 rows)

play= begin;
BEGIN
play= declare c cursor for select * from vv1 limit 2;
SELECT
play= fetch 10 from c;
 f1
-
   0
  123456
 -123456
  2147483647
 -2147483647
   0
(6 rows)

The reason for this behavior is that LIMIT and the FETCH count are
implemented by the same mechanism (ExecutorRun's count parameter)
and so FETCH has no choice but to override the LIMIT with its own
argument.

Yesterday I reimplemented LIMIT as a separate plan node type, in order
to make it work in views.  A side effect of this is that ExecutorRun's
count parameter is now *only* used for FETCH, and therefore a LIMIT
appearing in a DECLARE CURSOR does what IMHO it should do: you get
that many rows and no more from the cursor.

regression=# begin;
BEGIN
regression=# declare c cursor for select * from vv1 limit 2;
SELECT
regression=# fetch 10 from c;
   f1

  0
 123456
(2 rows)

Hiroshi was a little concerned about this change in behavior, and
so the first order of business is whether anyone wants to defend the
old way?  IMHO it was incontrovertibly a bug, but ...

The second question is how the presence of a LIMIT clause ought to
affect the planner's behavior.  In 7.0, we taught the planner to
pay attention to LIMIT as an indicator whether it ought to prefer
fast-start plans over lowest-total-cost plans.  For example, consider

SELECT * FROM tab ORDER BY col;

and assume there's a b-tree index on col.  Then the planner has two
possible choices of plan: an indexscan on col, or a sequential scan
followed by sort.  The indexscan will begin delivering tuples right
away, whereas the sort has to finish the sequential scan and perform
the sort before it can deliver the first tuple.  OTOH the total cost
to deliver the entire result is likely to be less for the sort plan
(let's assume for this discussion that it is).  So for the above
query the planner should and will choose the sort plan.  But for

SELECT * FROM tab ORDER BY col LIMIT 1;

it will choose the indexscan plan because of the low startup cost.
This is implemented by pricing a query that uses LIMIT on the basis
of linear interpolation between the startup and total costs, with the
interpolation point determined by the fraction of tuples we expect to
retrieve.

This is all pretty clear and seems to work OK for stand-alone SELECT.
But what about a DECLARE CURSOR?  The planner has no way to know how
much of the cursor's result will actually be FETCHed by the user, so
it's not clear how to use all this shiny new LIMIT planning mechanism
for a DECLARE CURSOR.

What happens in 7.0 and current code is that for a DECLARE CURSOR,
the planner ignores any LIMIT clause and arbitrarily assumes that the
user will FETCH about 10% of the available data.  Hence, the planning
is done on the basis of least "startup + 0.10*(total - startup)" cost.

Ignoring the limit clause was correct in 7.0, given the fact that the
limit wouldn't actually be used at runtime, but it's wrong now (unless
I'm beaten down on the semantics change).  Also, the 10% estimate is
the sort of compromise that's likely to satisfy nobody --- if you intend
to fetch all the data, quite likely you want the least total cost,
whereas if you only want the first few rows, you probably want a plan
biased even more heavily towards startup cost at the expense of total
cost.

After thinking some more about yesterday's discussions, I propose that
we adopt the following planning behavior for cursors:

1. If DECLARE CURSOR does not contain a LIMIT, continue to plan on the
basis of 10%-or-so fetch (I'd consider anywhere from 5% to 25% to be
just as reasonable, if people want to argue about the exact number;
perhaps a SET variable is in order?).  10% seems to be a reasonable
compromise between delivering tuples promptly and not choosing a plan
that will take forever if the user fetches the whole result.

2. If DECLARE CURSOR contains a specific "LIMIT n" clause, plan on
the assumption that n tuples will be fetched.  For small n this allows
the user to heavily bias the plan towards fast start.  Since the LIMIT
will actually be enforced by the executor, the user cannot bias the
plan more heavily than is justified by the number of tuples he's
intending to fetch, however.

3. If DECLARE CURSOR contains "LIMIT ALL", plan on the assumption that
all tuples will be fetched, ie, select lowest-total-cost plan.

(Note: LIMIT ALL has been in the grammar right along, but up to now
it has been entirely equivalent to leaving out the LIMIT clause.  This
proposal essentially suggests allowing it to act as a planner hint 

Re: [HACKERS] LIMIT in DECLARE CURSOR: request for comments

2000-10-27 Thread Tom Lane

Philip Warner [EMAIL PROTECTED] writes:
 At 12:18 27/10/00 -0400, Tom Lane wrote:
 1. If DECLARE CURSOR does not contain a LIMIT, continue to plan on the
 basis of 10%-or-so fetch (I'd consider anywhere from 5% to 25% to be
 just as reasonable, if people want to argue about the exact number;
 perhaps a SET variable is in order?).

 SET sounds good; will this work on a per-connection basis?

A SET variable would be connection-local, same as any other ...

 I don't suppose you'd consider 'OPTIMIZE FOR TOTAL COST' and 'OPTIMIZE FOR
 FAST START' optimizer hints?

I don't much care for adding such syntax to DECLARE CURSOR, if that's
what you're suggesting.  LIMIT ALL would have the same effect as
'OPTIMIZE FOR TOTAL COST' anyway.  LIMIT 1 (or a small number) would
have the effect of 'OPTIMIZE FOR FAST START', but would constrain you
to not fetch any more rows than that.  If we had a SET variable then
you could twiddle that value to favor fast-start or total-cost concerns
over a continuous range, without constraining how many rows you actually
fetch from a LIMIT-less cursor.

 Also, does the change you have made to the executor etc mean that
 subselect-with-limit is now possible?

The executor will do it, but unless Kevin figures out how to fix the
grammar, you'll have to put the LIMIT into a view definition, not inline
in a subquery.  View-with-LIMIT does work as of today.

regards, tom lane



Re: [HACKERS] LIMIT in DECLARE CURSOR: request for comments

2000-10-27 Thread Philip Warner

At 12:18 27/10/00 -0400, Tom Lane wrote:

1. If DECLARE CURSOR does not contain a LIMIT, continue to plan on the
basis of 10%-or-so fetch (I'd consider anywhere from 5% to 25% to be
just as reasonable, if people want to argue about the exact number;
perhaps a SET variable is in order?).  10% seems to be a reasonable
compromise between delivering tuples promptly and not choosing a plan
that will take forever if the user fetches the whole result.

SET sounds good; will this work on a per-connection basis?


2. If DECLARE CURSOR contains a specific "LIMIT n" clause, plan on
the assumption that n tuples will be fetched.  For small n this allows
the user to heavily bias the plan towards fast start.  Since the LIMIT
will actually be enforced by the executor, the user cannot bias the
plan more heavily than is justified by the number of tuples he's
intending to fetch, however.

Fine.


3. If DECLARE CURSOR contains "LIMIT ALL", plan on the assumption that
all tuples will be fetched, ie, select lowest-total-cost plan.

Good.



Comments?


I don't suppose you'd consider 'OPTIMIZE FOR TOTAL COST' and 'OPTIMIZE FOR
FAST START' optimizer hints?

Also, does the change you have made to the executor etc mean that
subselect-with-limit is now possible?



Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/