Re: [SQL] How Postgresql Compares For Some Query Types

2001-07-18 Thread markir


> 
>   I'm a little unclear on what a "fact table" is.  Can you explain?

Good question... The idea comes from data warehousing where a typical data 
construction involves two "types" of table :

dimensions : time, location, ethicity( essentially denormalized lookups )
facts  : observations, accidents, sales 

facts have a composite primary key , each componant of which is a foreign key 
for one of the dimensions.

If drawn with the fact table(s) in the center and dimensions around about, then 
the ERD looks like a "star". Hence the names "star schema" and "star query"

A quick search located a vaguely helpful page : ( apologies about the source )

http://www.oradoc.com/ora816/server.816/a76994/schemas.htm

A much better source is Kimball's book "The Data Warehousing Toolkit".

regards

Mark







---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [SQL] Different Choices For Index/Sequential Scan With And Without A Join In 7.2

2001-08-26 Thread markir


> Mark kirkwood <[EMAIL PROTECTED]> writes:
> > Note that  'f1' = '2000-01-26'  corrosponds to 'd0key' = 270 in the table 
> > 'dim0';
> 
> What do you mean by "corresponds to"?  Is there a one-to-one mapping
> between distinct values of fact0.d0key and distinct values of dim0.f1?
> Or do you just mean that the values play corresponding roles in these
> two queries?
> 

Sorry Tom ... clearly I didnt explain this very well...
But if you look at the rows in dim0 (see bottom of the previous mail) the above 
d0key and f1 are in the same row of 'dim0' - I neglected to mention that 'f1' 
is unique and 'd0key' is the primary key for 'dim0', so yes there is a 1-1 
mapping between 'd0key' and 'f1' in 'dim0. Therefore there is also a 1-1 
mapping between distinct 'd0key' in 'fact0' and 'f1' in 'dim0'.

so to get the rows in 'fact0', 'dim0' where dim0.f1 = '2000-01-26' is the same 
as getting the rows in fact0, dim0 where dim0.d0key = 270.
Given that the join is dim0.d0key = fact0.d0key then this is equivalent to 
fact0.d0key = 270.

Of course you are correct about a date column on dim0 not having the same 
selectivity as an int on fact0... but it seems to me ( incorrectly ? ) that in 
order in access fact0 from the resulting dim0 rows for f1, the optimizer must 
use the set of d0key(s) extracted from dim0 and go to fact0 with then. This was 
exactly what the unjoined query was doing - which gets us back to my original 
question again ( I think ).

On the other points : cpu_tuple_cost and distribution - These are completely 
correct, I will use another similar table that has uniformly distributed data - 
this should mean no fiddling about with cpu_tuple_cost is required.

In addition, to clarify the issue furthur I am considering removing f1 from the 
example, and using d0key in both queries, to see what happens then.

Thanks for your patience on this.

regards

Mark



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl