[PERFORM] extract(field from timestamp) vs date dimension

2007-01-23 Thread Tobias Brox
Does anyone have experience with using postgres for data warehousing?
Right, I saw one post suggestion to use mysql for a mostly read-only
database ... but anyway, I think it's not a question to change the
database platform for this project, at least not today ;-)

Ralph Kimball seems to be some kind of guru on data warehousing, and
in his books he's strongly recommending to have a date dimension -
simply a table describing all dates in the system, and having
attributes for what day of the week it is, month, day of the month,
week number, bank holiday, anything special, etc.  Well, it does make
sense if adding lots of information there that cannot easily be pulled
out from elsewhere - but as for now, I'm mostly only interessted in
grouping turnover/profit by weeks/months/quarters/years/weekdays.  It
seems so much bloated to store this information, my gut feeling tells it
should be better to generate them on the fly.  Postgres even allows to
create an index on an expression.

The question is ... I'm curious about what would yield the highest
performance - when choosing between:

  select extract(week from created), ...
  from some_table
  where ...
  group by extract(week from created), ...
  sort by extract(week from created), ...

and:

  select date_dim.week_num, ...
  from some_table join date_dim ...
  where ...
  group by date_dim.week_num, ...
  sort by date_dim, week_num, ...

The date_dim table would eventually cover ~3 years of operation, that
is less than 1000 rows.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] extract(field from timestamp) vs date dimension

2007-01-23 Thread Chad Wagner

On 1/23/07, Tobias Brox [EMAIL PROTECTED] wrote:


Ralph Kimball seems to be some kind of guru on data warehousing, and
in his books he's strongly recommending to have a date dimension -
simply a table describing all dates in the system, and having



I would tend to agree with this line of thought.


out from elsewhere - but as for now, I'm mostly only interessted in

grouping turnover/profit by weeks/months/quarters/years/weekdays.  It
seems so much bloated to store this information, my gut feeling tells it
should be better to generate them on the fly.  Postgres even allows to
create an index on an expression.



I guess go with your gut, but at some point the expressions are going to be
too complicated to maintain, and inefficient.

Calendar tables are very very common, because traditional date functions
simply can't define business logic (especially things like month end close,
quarter end close, and year end close) that doesn't have any repeating
patterns (every 4th friday, 1st monday in the quarter, etc).  Sure you can
stuff it into a function, but it just isn't as maintainable as a table.




--
Chad
http://www.postgresqlforums.com/


Re: [PERFORM] extract(field from timestamp) vs date dimension

2007-01-23 Thread Tobias Brox
[Chad Wagner - Tue at 08:24:34AM -0500]
 I guess go with your gut, but at some point the expressions are going to be
 too complicated to maintain, and inefficient.

The layout of my system is quite flexible, so it should eventually be
fairly trivial to throw in a date dimension at a later stage.

 Calendar tables are very very common, because traditional date functions
 simply can't define business logic (especially things like month end close,
 quarter end close, and year end close) that doesn't have any repeating
 patterns (every 4th friday, 1st monday in the quarter, etc).  Sure you can
 stuff it into a function, but it just isn't as maintainable as a table.

So far I haven't been bothered with anything more complex than clean
weeks, months, quarters, etc.

I suppose the strongest argument for introducing date dimensions already
now is that I probably will benefit from having conform and
well-designed dimensions when I will be introducing more data marts.  As
for now I have only one fact table and some few dimensions in the
system.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] extract(field from timestamp) vs date dimension

2007-01-23 Thread Merlin Moncure

On 1/23/07, Tobias Brox [EMAIL PROTECTED] wrote:

Does anyone have experience with using postgres for data warehousing?
Right, I saw one post suggestion to use mysql for a mostly read-only
database ... but anyway, I think it's not a question to change the
database platform for this project, at least not today ;-)

Ralph Kimball seems to be some kind of guru on data warehousing, and
in his books he's strongly recommending to have a date dimension -
simply a table describing all dates in the system, and having
attributes for what day of the week it is, month, day of the month,
week number, bank holiday, anything special, etc.  Well, it does make
sense if adding lots of information there that cannot easily be pulled
out from elsewhere - but as for now, I'm mostly only interessted in
grouping turnover/profit by weeks/months/quarters/years/weekdays.  It
seems so much bloated to store this information, my gut feeling tells it
should be better to generate them on the fly.  Postgres even allows to
create an index on an expression.

The question is ... I'm curious about what would yield the highest
performance - when choosing between:

  select extract(week from created), ...
  from some_table
  where ...
  group by extract(week from created), ...
  sort by extract(week from created), ...

and:

  select date_dim.week_num, ...
  from some_table join date_dim ...
  where ...
  group by date_dim.week_num, ...
  sort by date_dim, week_num, ...

The date_dim table would eventually cover ~3 years of operation, that
is less than 1000 rows.



In my opinion, I would make a date_dim table for this case.  I would
strongly advice against making a date_id field, just use the date
itself as the p-key (i wouldn't bother with RI links to the table
though).

I would also however make a function and use this to make the record:
create or replace function make_date_dim(in_date date) returns
date_dim as $$ [...]

And make date_dim records this way:
insert into date_dim select * from make_dim('01/01/2001'::date);

(or pre-insert with generate_series).
now you get the best of both worlds: you can join to the table for the
general case or index via function for special case indexes.  for
example suppose you had to frequently count an account's sales by
fiscal year quarter irrespective of year:

create index q_sales_idx on account_sale(account_no,
(make_dim(sale_date)).fiscal_quarter);

also you can use the function in place of a join if you want.  In some
cases the join may be better, though.

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] extract(field from timestamp) vs date dimension

2007-01-23 Thread Mark Kirkwood

Tobias Brox wrote:



I suppose the strongest argument for introducing date dimensions already
now is that I probably will benefit from having conform and
well-designed dimensions when I will be introducing more data marts.  As
for now I have only one fact table and some few dimensions in the
system.



Another factors to consider is that end user tools (and end users) may 
find a date/time dimension helpful.


Best wishes

Mark

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate