Re: Hints (Was: [HACKERS] Index Tuning Features)

2006-10-13 Thread Josh Berkus
Tom,

> We have also talked about solving the multi-column statistics problem
> (which, at its core, is "which combinations of columns are worth
> accumulating stats for?" --- you can't possibly store stats for every
> combination!) by having what would amount to hints from the DBA saying
> "keep stats for these combinations".

We could start just by keeping stats for multiple columns which are indexed 
together.  That doesnt' cover everything (thanks to bitmapping) but would 
cover many cases.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: Hints (Was: [HACKERS] Index Tuning Features)

2006-10-12 Thread Andrew Dunstan
Tom Lane wrote:

> We have also talked about solving the multi-column statistics problem
> (which, at its core, is "which combinations of columns are worth
> accumulating stats for?" --- you can't possibly store stats for every
> combination!) by having what would amount to hints from the DBA saying
> "keep stats for these combinations".
>

This strikes me intuitively as the most likely candidate so far for
improvement. I'm much more interested in schemes that will improve the
stats system, rather than providing a way around it.

cheers

andrew


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: Hints (Was: [HACKERS] Index Tuning Features)

2006-10-12 Thread Tom Lane
Casey Duncan <[EMAIL PROTECTED]> writes:
> Yes, but it may be much more efficient for the human to tell the  
> computer than for the computer to introspect things. Take, for  
> example, ndisinct as data grows large.

Yeah, an override estimate for a column's ndistinct seems a perfect
example of the sort of statistical hint that I'd be in favor of having.
We have also talked about solving the multi-column statistics problem
(which, at its core, is "which combinations of columns are worth
accumulating stats for?" --- you can't possibly store stats for every
combination!) by having what would amount to hints from the DBA saying
"keep stats for these combinations".

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: Hints (Was: [HACKERS] Index Tuning Features)

2006-10-12 Thread Casey Duncan

On Oct 12, 2006, at 4:26 AM, Andrew Sullivan wrote:


On Thu, Oct 12, 2006 at 08:34:45AM +0200, Florian Weimer wrote:


Some statistics are very hard to gather from a sample, e.g. the  
number

of distinct values in a column.


Then how can the DBA know it, either?  The problem with this sort of
argument is always that people are claiming some special knowledge is
available to the DBA.  If it's true that the DBA really _can_ know
this stuff, then there must be some way to learn it.  Which means
that you can, in principle, figure out ways to communicate that to
the optimizer.


Yes, but it may be much more efficient for the human to tell the  
computer than for the computer to introspect things. Take, for  
example, ndisinct as data grows large. I, the database designer, may  
know (or simply see) that a certain foreign key column will have  
roughly a certain cardinality regardless of how big the table gets.  
It's a lot more efficient for me to tell the system that up front  
then have it need to do a full table scan or tens of millions of rows  
periodically to figure it out, or worse--as it is currently--to come  
up with an estimate that is multiple orders of magnitude off, even  
with the stats target turned all the way up.


I realize that this is a case that is possible to do manually now,  
sort of. I can tweak the stats table myself. But it would be nice if  
you could do it in such a way that it would override what analyze  
comes up with on a case-by-case basis.


We could have a perfect query planner, but feed it bad stats and it  
will still make poor decisions.


I'm of the strong opinion that hinting the data is much better than  
hinting the queries. There tends to be many fewer places you need to  
do that, and new queries can automatically take advantage.



I like the suggestion, though, that there be ways to codify known
relationships in the system in such a way that the optimizer can
learn to use that information.  _That_ seems to me to be a big
improvement, because it can be taken into consideration along with
relationships that emerge from the statistics, that the DBA may not
know about.


I'm all for things the computer can do for me automagically. It's  
just good to have the ability to tell the computer about things you  
know about the data that it either can't efficiently figure out or  
can't figure out at all.


-Casey


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Hints (Was: [HACKERS] Index Tuning Features)

2006-10-12 Thread Andrew Sullivan
On Thu, Oct 12, 2006 at 08:34:45AM +0200, Florian Weimer wrote:
> 
> Some statistics are very hard to gather from a sample, e.g. the number
> of distinct values in a column.

Then how can the DBA know it, either?  The problem with this sort of
argument is always that people are claiming some special knowledge is
available to the DBA.  If it's true that the DBA really _can_ know
this stuff, then there must be some way to learn it.  Which means
that you can, in principle, figure out ways to communicate that to
the optimizer.

I like the suggestion, though, that there be ways to codify known
relationships in the system in such a way that the optimizer can
learn to use that information.  _That_ seems to me to be a big
improvement, because it can be taken into consideration along with
relationships that emerge from the statistics, that the DBA may not
know about.


A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism. 
--Brad Holland

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Hints (was: [HACKERS] Index Tuning Features)

2006-10-12 Thread Andrew Sullivan
On Wed, Oct 11, 2006 at 03:08:42PM -0700, Ron Mayer wrote:
> Is one example is the table of addresses clustered by zip-code
> and indexes on State, City, County, etc?

No.

> Now I'm not saying that a more advanced statistics system
> couldn't one-day be written that sees these patterns in the
> data -- but it doesn't seem likely in the near term.  DBA-based
> hints could be a useful interim work-around.

Some others in the hints thread seem to be suggesting additional ways
of teaching the optimiser what to do.  _That_ seems to me to be a
good idea (but I don't think that qualifies as what people usually
think of as hints).  A sufficiently general system of hints sprinkled
on the SQL is a lot of work, and doesn't seem to me to be a whole lot
easier than working out how to make second-order relationship
discovery (of the sort you're talking about) cheaper and automatic. 
Certainly, there's plenty of statistics math kicking around that
allows one to discover such relationships, and they have the benefit
of not being by definition a way to work around the optimiser.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Users never remark, "Wow, this software may be buggy and hard 
to use, but at least there is a lot of code underneath."
--Damien Katz

---(end of broadcast)---
TIP 6: explain analyze is your friend