Re: [PERFORM] Flattening a kind of 'dynamic' table

2005-02-04 Thread Merlin Moncure
 Alexandre Leclerc wrote:
 Sorry for jumping in on this thread so late -- I haven't been able to
 select * from crosstab(
'select product_id, department_id, req_time
 from product_department_time order by 1',
'select ''A'' union all select ''C'' union all select ''D'''
 ) as (product_id int, a int, c int, d int);

I forgot you could do this...This would certainly be easier than parsing
array values returned from array_accum.  It will probably be faster as
well...but with the array approach the query would not have to be
modified each time a new department was added.  That said, a crosstab
based query could be built easily enough from a department query on the
client and then you have the best of both worlds.

Merlin



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Flattening a kind of 'dynamic' table

2005-02-04 Thread Alexandre Leclerc
On Fri, 4 Feb 2005 12:48:43 -0500, Merlin Moncure
[EMAIL PROTECTED] wrote:
  Alexandre Leclerc wrote:
  Sorry for jumping in on this thread so late -- I haven't been able to
  select * from crosstab(
 'select product_id, department_id, req_time
  from product_department_time order by 1',
 'select ''A'' union all select ''C'' union all select ''D'''
  ) as (product_id int, a int, c int, d int);
 
 I forgot you could do this...This would certainly be easier than parsing
 array values returned from array_accum.  It will probably be faster as
 well...but with the array approach the query would not have to be
 modified each time a new department was added.  That said, a crosstab
 based query could be built easily enough from a department query on the
 client and then you have the best of both worlds.

Hello Merlin,

Well, I'm glad because with all this i've learn a lot of new things.

Finally, the crosstab solution is very fast and is simple for me to
use. I get my super-bug-jumbo-dbkiller-query run in about 210ms
(seeking many tables and so). I had a score of 2480ms before. (This is
a much more complex query; the cross table thing had to be included in
this one.) This is much better! :)

In all, thanks for your help. Regards.

-- 
Alexandre Leclerc

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Flattening a kind of 'dynamic' table

2005-01-28 Thread Dawid Kuroczko
On Thu, 27 Jan 2005 12:43:56 -0500, Alexandre Leclerc
[EMAIL PROTECTED] wrote:
 On Thu, 27 Jan 2005 17:27:40 +0100, Dawid Kuroczko [EMAIL PROTECTED] wrote:
  On Thu, 27 Jan 2005 10:23:34 -0500, Alexandre Leclerc
  [EMAIL PROTECTED] wrote:
   Here a normal listing of design.product_department_time:
product_id | department_id | req_time
   +---+--
   906 | A | 3000
   906 | C | 3000
   906 | D | 1935
   907 | A | 1500
   907 | C | 1500
   907 | D | 4575
   924 | A | 6000
   924 | C | 1575
 
  Well, I did something like this recently; it can be done though
  maybe not very efficiently...
 
  Unfortunately we will need a rowtype with all the departaments:
  CREATE DOMAIN departaments AS (a int, b int, c int, d int, ...);
 Thank you for this help Dawid, I'll have to take some time to look at
 this suggestion. If I must create a domain with all the departments
 I'll have a problem because the user is creating and deleting
 departments as it pleases him.
 
 Any counter-ideas?

I have exactly the same problem with my proposal [1]
I just wish there would be some native rows-to-columns
aggregate.

The other approach I used was something like this:
SELECT product_id, a, b, c FROM
  (SELECT product_id, a FROM pdt) AS a FULL OUTER JOIN USING(product_id)
  (SELECT product_id, b FROM pdt) AS b FULL OUTER JOIN USING(product_id)
  (SELECT product_id, c FROM pdt) AS c;
...or similar (I'm typing from memory ;)).  Anyway it was good for getting
whole table, but performance well, wasn't the gratest. ;)).

   Regards,
  Dawid

[1]: I was thinking about a trigger on a departaments table,
and then recreating the aggregate and view as needed, but
it isn't the kind of dynamic I had in mind. ;)

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Flattening a kind of 'dynamic' table

2005-01-28 Thread Alexandre Leclerc
On Fri, 28 Jan 2005 09:07:59 +0100, Dawid Kuroczko [EMAIL PROTECTED] wrote:
 On Thu, 27 Jan 2005 12:43:56 -0500, Alexandre Leclerc
 [EMAIL PROTECTED] wrote:
  On Thu, 27 Jan 2005 17:27:40 +0100, Dawid Kuroczko [EMAIL PROTECTED] 
  wrote:
   On Thu, 27 Jan 2005 10:23:34 -0500, Alexandre Leclerc
   [EMAIL PROTECTED] wrote:
Here a normal listing of design.product_department_time:
 product_id | department_id | req_time
+---+--
906 | A | 3000
906 | C | 3000
906 | D | 1935
907 | A | 1500
907 | C | 1500
907 | D | 4575
924 | A | 6000
924 | C | 1575
  
   Well, I did something like this recently; it can be done though
   maybe not very efficiently...
  
   Unfortunately we will need a rowtype with all the departaments:
   CREATE DOMAIN departaments AS (a int, b int, c int, d int, ...);
  Thank you for this help Dawid, I'll have to take some time to look at
  this suggestion. If I must create a domain with all the departments
  I'll have a problem because the user is creating and deleting
  departments as it pleases him.
 
  Any counter-ideas?
 
 I have exactly the same problem with my proposal [1]
 I just wish there would be some native rows-to-columns
 aggregate.
 
 [1]: I was thinking about a trigger on a departaments table,
 and then recreating the aggregate and view as needed, but
 it isn't the kind of dynamic I had in mind. ;)

Yep, this is the only thing I also tought: a trigger to add / remove
columns when the user add or remove a department... but this is not
exactly what I wanted (this is not a very nice db design, from my
perspective).

Thank you for you help.

-- 
Alexandre Leclerc

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Flattening a kind of 'dynamic' table

2005-01-28 Thread Dawid Kuroczko
On Fri, 28 Jan 2005 10:24:37 -0800, Joe Conway [EMAIL PROTECTED] wrote:
 Alexandre Leclerc wrote:
  I'm a little bit confused on how to install this contirb. I know my
  contrib package is installed, but I don't know how to make it work in
  postgresql. (Using 7.4.5-1mdk on Mandrake Linux.)
 
 
 Find the file tablefunc.sql and redirect it into your database, e.g.
 
 psql mydatabase  /path/to/contrib/scripts/tablefunc.sql
 
 I have no idea where that would be on Mandrake, but you could probably do:
 
 locate tablefunc.sql
 
 On Fedora Core 1 I find it here:
 /usr/share/pgsql/contrib/tablefunc.sql
 
 Also find and read README.tablefunc.
 
 HTH,

WHOA!  Yess!  Exactly the thing!  Amazing! :)))

   Regards,
  Dawid

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Flattening a kind of 'dynamic' table

2005-01-27 Thread Merlin Moncure
Alexandre wrote:
 Here a normal listing of design.product_department_time:
  product_id | department_id | req_time
 +---+--
 906 | A | 3000
 906 | C | 3000
 906 | D | 1935
 907 | A | 1500
 907 | C | 1500
 907 | D | 4575
 924 | A | 6000
 924 | C | 1575
  product_id |  a   |  c   |  d
 +--+--+--
 924 | 6000 | 1575 |
 907 | 1500 | 1500 | 4575
 906 | 3000 | 3000 | 1935

ok, you have a couple of different options here.  The first thing that
jumps out at me is to use arrays to cheat using arrays.
  Let's start with the normalized result set.

select product_id, department_id,  sum(req_time) group by product_id,
department_id  

product_id |  department_id|  sum   
924   a   6000
924   c   1575
907   a   1500
[...]

This should be no slower (in fact faster) then your original query and
does not have to be re-coded when you add new departments (you have a
department table, right?).

If you absolutely must have 1 record/product, you can cheat using
arrays:

select q.product_id, 
   array_accum(q.department_id) as depts,
   array_accum(q.req_time)  as times
 from 
   (
select product_id, department_id,  sum(req_time) as req_time
group by product_id, department_id
   ) q
group by q.product_id;


select product_id, array_accum(department_id) sum(req_time) group by
product_id

product_id |  department_id|  sum   
924   {a, c}  {1500, 1575}
 [...]

disclaimer 1: I never checked syntax
disclaimer 2: you may have to add array_accum to pg (check docs)
Merlin

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Flattening a kind of 'dynamic' table

2005-01-27 Thread Dawid Kuroczko
On Thu, 27 Jan 2005 10:23:34 -0500, Alexandre Leclerc
[EMAIL PROTECTED] wrote:
 Here a normal listing of design.product_department_time:
  product_id | department_id | req_time
 +---+--
 906 | A | 3000
 906 | C | 3000
 906 | D | 1935
 907 | A | 1500
 907 | C | 1500
 907 | D | 4575
 924 | A | 6000
 924 | C | 1575

Well, I did something like this recently; it can be done though
maybe not very efficiently...

Unfortunately we will need a rowtype with all the departaments:
CREATE DOMAIN departaments AS (a int, b int, c int, d int, ...);

A function aggregate for this type:
CREATE FUNCTION dep_agg(ds departaments, args text[]) RETURNS departaments AS $$
   BEGIN
  IF args[1] = 'A' THEN ds.a = args[2]; -- I think it is not
possible to do ds.$args[1] = args[2] equivalent.
  ELSIF args[1] = 'B' THEN ds.b = args[2];
  ELSIF args[1] = 'C' THEN ds.c = args[2];
  ELSIF args[1] = 'D' THEN ds.d = args[2];
  END IF;
  RETURN ds;
   END;
$$ LANUGAGE plpgsql;

THEN an aggregate:
CREATE AGGREGATE dep_aggregate (basetype = text[], stype =
departaments, sfunc =dep_agg);

AND then a view for sugar:

CREATE VIEW prod_dep_time VIEW AS
   SELECT product_id, (dep_aggregate(ARRAY[departament_id, req_time]::text[])).*
   FROM product_department_time GROUP BY product_id;

And voila. :)
Couple of comments:
 -- aggregate takes array[] since making multicolumn aggregates is
not possible, as far as I know.
 -- I did not check the code, yet I did manage to make it work some time before.
 You may need to use ROWS or something in the function definition; I
don't remember and can't check it right now.
 -- comments welcome. :)

   Regards,
  Dawid

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Flattening a kind of 'dynamic' table

2005-01-27 Thread Christopher Kings-Lynne
Unfortunately we will need a rowtype with all the departaments:
CREATE DOMAIN departaments AS (a int, b int, c int, d int, ...);
I think you mean CREATE TYPE departments...
Chris
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Flattening a kind of 'dynamic' table

2005-01-27 Thread Alexandre Leclerc
On Thu, 27 Jan 2005 17:27:40 +0100, Dawid Kuroczko [EMAIL PROTECTED] wrote:
 On Thu, 27 Jan 2005 10:23:34 -0500, Alexandre Leclerc
 [EMAIL PROTECTED] wrote:
  Here a normal listing of design.product_department_time:
   product_id | department_id | req_time
  +---+--
  906 | A | 3000
  906 | C | 3000
  906 | D | 1935
  907 | A | 1500
  907 | C | 1500
  907 | D | 4575
  924 | A | 6000
  924 | C | 1575
 
 Well, I did something like this recently; it can be done though
 maybe not very efficiently...
 
 Unfortunately we will need a rowtype with all the departaments:
 CREATE DOMAIN departaments AS (a int, b int, c int, d int, ...);


Thank you for this help Dawid, I'll have to take some time to look at
this suggestion. If I must create a domain with all the departments
I'll have a problem because the user is creating and deleting
departments as it pleases him.

Any counter-ideas?

Regards.

-- 
Alexandre Leclerc

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Flattening a kind of 'dynamic' table

2005-01-27 Thread Alexandre Leclerc
On Thu, 27 Jan 2005 10:44:45 -0500, Merlin Moncure
[EMAIL PROTECTED] wrote:
 Alexandre wrote:
  Here a normal listing of design.product_department_time:
   product_id | department_id | req_time
  +---+--
  906 | A | 3000
  906 | C | 3000
  906 | D | 1935
  907 | A | 1500
  907 | C | 1500
  907 | D | 4575
  924 | A | 6000
  924 | C | 1575
   product_id |  a   |  c   |  d
  +--+--+--
  924 | 6000 | 1575 |
  907 | 1500 | 1500 | 4575
  906 | 3000 | 3000 | 1935
 
 ok, you have a couple of different options here.  The first thing that
 jumps out at me is to use arrays to cheat using arrays.
   Let's start with the normalized result set.
 
 select product_id, department_id,  sum(req_time) group by product_id,
 department_id
 
 product_id |  department_id|  sum
 924   a   6000
 924   c   1575
 907   a   1500
 [...]

Hello Merlin,

First of all, thanks for your time. Yes this is exactly what I'm doing
right now (if I understand well your point here). All records in
design.product_department_time are unique for each (product_id,
req_time) combo and 0-null values are not possible. This is the first
listing you have.

In my query I added the sum() and GROUP BY stuff to avoid having such a listing:

 product_id |  a   |  c   |  d
+--+--+--
906 | 3000 |  |
906 |  | 3000 |
906 |  |  | 1935
907 | 1500 |  |
907 |  | 1500 |
907 |  |  | 4575
924 | 6000 |  |
924 |  | 1575 |

So that for a given product_id I have all the times by departments in
a single row (second listing I posted).

 If you absolutely must have 1 record/product, you can cheat using
 arrays:
 
 select q.product_id,
array_accum(q.department_id) as depts,
array_accum(q.req_time)  as times
  from
(
 select product_id, department_id,  sum(req_time) as req_time
 group by product_id, department_id
) q
 group by q.product_id;
 
 select product_id, array_accum(department_id) sum(req_time) group by
 product_id
 
 product_id |  department_id|  sum
 924   {a, c}  {1500, 1575}
  [...]

I did not used arrays because I didn't think about it, but I don't
know if this is still the most efficient way. My software will have to
work out the data, unless the array expands in good columns. But I'm
not an expert at all. I try to do good DB design, but sometimes this
is more complicated to work with the data.

Here is the table definition if it can help:

design.products (product_id serial PRIMARY KEY, ...);
prod.departments (department_id varchar(3) PRIMARY KEY, ...);

design.product_department_time (
product_id  integer REFERENCES design.products ON DELETE
CASCADE ON UPDATE CASCADE,
department_id   varchar(3)  REFERENCES prod.departments ON DELETE
CASCADE ON UPDATE CASCADE,
req_timeinteger NOT NULL DEFAULT 0 CHECK (req_time = 0),
CONSTRAINT product_department_time_pkey PRIMARY KEY (product_id, department_id)
);

And i also have a jobs table which has one product_id attached to one
job with the required quantity to produce. So I must shouw the user
how much time this will require by departments for each jobs. :) This
is a nice report, but I don't want to kill the database each time the
user want to see it.

Thanks for your contrib so far, this will help me looking for other
ways doing it. I'm always ready to hear more!

Regards.

-- 
Alexandre Leclerc

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Flattening a kind of 'dynamic' table

2005-01-27 Thread Merlin Moncure
Alexandre wrote:
 On Thu, 27 Jan 2005 10:44:45 -0500, Merlin Moncure
 [EMAIL PROTECTED] wrote:
  Alexandre wrote:
  ok, you have a couple of different options here.  The first thing
that
  jumps out at me is to use arrays to cheat using arrays.
Let's start with the normalized result set.
 
  select product_id, department_id,  sum(req_time) group by
product_id,
  department_id
 
  product_id |  department_id|  sum
  924   a   6000
  924   c   1575
  907   a   1500
  [...]
 
 Hello Merlin,
 
 First of all, thanks for your time. Yes this is exactly what I'm doing
 right now (if I understand well your point here). All records in
 design.product_department_time are unique for each (product_id,
 req_time) combo and 0-null values are not possible. This is the first
 listing you have.

Right.  I expanding departments into columns is basically a dead end.
First of all, SQL is not really designed to do this, and second of all
(comments continued below)

  product_id |  a   |  c   |  d
 +--+--+--
 906 | 3000 |  |
 906 |  | 3000 |
 906 |  |  | 1935
 907 | 1500 |  |
 907 |  | 1500 |
 907 |  |  | 4575
 924 | 6000 |  |
 924 |  | 1575 |

the above table is more expensive to group than the normalized version
above because it is much, much longer.  This will get worse and worse as
you add more departments.  So, whatever you end up doing, I'd advise
against expanding rows from a table into columns of a result except for
very, very special cases.  This is not one of those cases.

 I did not used arrays because I didn't think about it, but I don't
 know if this is still the most efficient way. My software will have to
 work out the data, unless the array expands in good columns. But I'm
 not an expert at all. I try to do good DB design, but sometimes this
 is more complicated to work with the data.

Arrays are a quick'n'dirty way to de-normalize a result set.  According
to me, de-normalization is o.k. for result sets *only*.  Generally, it
is inappropriate to de-normalize any persistent object in the database,
such as a view (or especially) a table.  de-normalizing sets can
sometimes simplify client-side coding issues or provide a performance
benefit at the query stage (or slow it down, so be careful!)

 And i also have a jobs table which has one product_id attached to one
 job with the required quantity to produce. So I must shouw the user
 how much time this will require by departments for each jobs. :) This
 is a nice report, but I don't want to kill the database each time the
 user want to see it.

You always have the option to do this in code.  This basically means
ordering the result set and writing a nested loop to pass over the data.
If you happen to be using a report engine (and it sounds like you are),
some engines can simplify this via a grouping criteria, some can't.

If parsing an array string is a pain I happen to have a C++ class handy
that can compose/decompose a postgresql array string if:
a: no more than 1 dimension and 
b: array bounds are known

Let me know if you need it and I'll send it over.
Merlin

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Flattening a kind of 'dynamic' table

2005-01-27 Thread Alexandre Leclerc
On Thu, 27 Jan 2005 13:02:48 -0500, Merlin Moncure
[EMAIL PROTECTED] wrote:
 Alexandre wrote:
  On Thu, 27 Jan 2005 10:44:45 -0500, Merlin Moncure
  [EMAIL PROTECTED] wrote:
   Alexandre wrote:
 Let's start with the normalized result set.
  
   product_id |  department_id|  sum
   924   a   6000
   924   c   1575
   907   a   1500
   [...]
 
 Right.  I expanding departments into columns is basically a dead end.
 First of all, SQL is not really designed to do this, and second of all
 (comments continued below)

Ok, I got it. The basic message is to avoid making columns out of rows
like I'm doing right now, that de-normalizing in an array is the way
to go. So I should query and get the results in an array then after my
application will parse the array into the good columns. (I'm
developping a software.)

If I still got it wrong, this is because the 'geek' section of my
brain is in vacation: leave a message and when it'll come back, it'll
explain all this to me! :)

So I found the array_accum function in the doc, so I did create it.

CREATE AGGREGATE array_accum (
sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}'
);

Then I created this new select:
SELECT 
  product_id, 
  array_accum(department_id) as a_department_id,
  array_accum(req_time) as a_req_time
FROM (SELECT * FROM design.product_department_time) AS tmp 
GROUP BY product_id;

It gives:
 product_id | a_department_id |a_req_time
+-+--
924 | {A,C}   | {6000,1575}
907 | {A,C,D} | {1500,1500,4575}
906 | {A,C,D} | {3000,3000,1935}

So, the performance should be much better using this agregate approach?

No I thing I'll merge the results in my software, unless you think
that at this point doing a LEFT JOIN with my jobs table is the way to
go, beacuse the performance will be good. (Personally I don't know the
answer of this one.)

 If parsing an array string is a pain I happen to have a C++ class handy
 that can compose/decompose a postgresql array string if:
 a: no more than 1 dimension and
 b: array bounds are known
 
 Let me know if you need it and I'll send it over.

Thank you for your offer. I think parsing an array is the easiest
thing to do for me in all this. :) If I encounter any problem, I'll
drop you a mail.

Regards.

-- 
Alexandre Leclerc

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Flattening a kind of 'dynamic' table

2005-01-27 Thread Merlin Moncure
Alexandre wrote:
  
  Right.  I expanding departments into columns is basically a dead
end.
  First of all, SQL is not really designed to do this, and second of
all
  (comments continued below)
 
 Ok, I got it. The basic message is to avoid making columns out of rows

yes.  This is wrong.

 like I'm doing right now, that de-normalizing in an array is the way
 to go. 

Only sometimes.  Looping application code is another tactic.  There may
be other things to do as well that don't involve arrays or application
code.  Consider arrays a (very postgresql specific) tool in your
expanding toolchest.

De-normalization is a loaded term because we are only presenting queried
data in an alternate format (normalization normally applying to data
structured within the database).  There are many people on this list who
will tell you not to de-normalize anything, ever (and most of the time,
you shouldn't!).  

Merlin

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Flattening a kind of 'dynamic' table

2005-01-27 Thread Alexandre Leclerc
On Thu, 27 Jan 2005 16:05:09 -0500, Merlin Moncure
[EMAIL PROTECTED] wrote:
 Alexandre wrote:
  like I'm doing right now, that de-normalizing in an array is the way
  to go.
 
 Only sometimes.  Looping application code is another tactic.  There may
 be other things to do as well that don't involve arrays or application
 code.  Consider arrays a (very postgresql specific) tool in your
 expanding toolchest.

I take good notes of that. All this opened to me other ways for
solutions, so I'm glad of that. I'll take more time to think about all
that.

 De-normalization is a loaded term because we are only presenting queried
 data in an alternate format (normalization normally applying to data
 structured within the database).  There are many people on this list who
 will tell you not to de-normalize anything, ever (and most of the time,
 you shouldn't!).

Thank you for all you help and time for this.

Best regards.

-- 
Alexandre Leclerc

---(end of broadcast)---
TIP 8: explain analyze is your friend