Re: [PERFORM] Flattening a kind of 'dynamic' table
Alexandre Leclerc wrote: Sorry for jumping in on this thread so late -- I haven't been able to select * from crosstab( 'select product_id, department_id, req_time from product_department_time order by 1', 'select ''A'' union all select ''C'' union all select ''D''' ) as (product_id int, a int, c int, d int); I forgot you could do this...This would certainly be easier than parsing array values returned from array_accum. It will probably be faster as well...but with the array approach the query would not have to be modified each time a new department was added. That said, a crosstab based query could be built easily enough from a department query on the client and then you have the best of both worlds. Merlin ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Flattening a kind of 'dynamic' table
On Fri, 4 Feb 2005 12:48:43 -0500, Merlin Moncure [EMAIL PROTECTED] wrote: Alexandre Leclerc wrote: Sorry for jumping in on this thread so late -- I haven't been able to select * from crosstab( 'select product_id, department_id, req_time from product_department_time order by 1', 'select ''A'' union all select ''C'' union all select ''D''' ) as (product_id int, a int, c int, d int); I forgot you could do this...This would certainly be easier than parsing array values returned from array_accum. It will probably be faster as well...but with the array approach the query would not have to be modified each time a new department was added. That said, a crosstab based query could be built easily enough from a department query on the client and then you have the best of both worlds. Hello Merlin, Well, I'm glad because with all this i've learn a lot of new things. Finally, the crosstab solution is very fast and is simple for me to use. I get my super-bug-jumbo-dbkiller-query run in about 210ms (seeking many tables and so). I had a score of 2480ms before. (This is a much more complex query; the cross table thing had to be included in this one.) This is much better! :) In all, thanks for your help. Regards. -- Alexandre Leclerc ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Flattening a kind of 'dynamic' table
On Thu, 27 Jan 2005 12:43:56 -0500, Alexandre Leclerc [EMAIL PROTECTED] wrote: On Thu, 27 Jan 2005 17:27:40 +0100, Dawid Kuroczko [EMAIL PROTECTED] wrote: On Thu, 27 Jan 2005 10:23:34 -0500, Alexandre Leclerc [EMAIL PROTECTED] wrote: Here a normal listing of design.product_department_time: product_id | department_id | req_time +---+-- 906 | A | 3000 906 | C | 3000 906 | D | 1935 907 | A | 1500 907 | C | 1500 907 | D | 4575 924 | A | 6000 924 | C | 1575 Well, I did something like this recently; it can be done though maybe not very efficiently... Unfortunately we will need a rowtype with all the departaments: CREATE DOMAIN departaments AS (a int, b int, c int, d int, ...); Thank you for this help Dawid, I'll have to take some time to look at this suggestion. If I must create a domain with all the departments I'll have a problem because the user is creating and deleting departments as it pleases him. Any counter-ideas? I have exactly the same problem with my proposal [1] I just wish there would be some native rows-to-columns aggregate. The other approach I used was something like this: SELECT product_id, a, b, c FROM (SELECT product_id, a FROM pdt) AS a FULL OUTER JOIN USING(product_id) (SELECT product_id, b FROM pdt) AS b FULL OUTER JOIN USING(product_id) (SELECT product_id, c FROM pdt) AS c; ...or similar (I'm typing from memory ;)). Anyway it was good for getting whole table, but performance well, wasn't the gratest. ;)). Regards, Dawid [1]: I was thinking about a trigger on a departaments table, and then recreating the aggregate and view as needed, but it isn't the kind of dynamic I had in mind. ;) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Flattening a kind of 'dynamic' table
On Fri, 28 Jan 2005 09:07:59 +0100, Dawid Kuroczko [EMAIL PROTECTED] wrote: On Thu, 27 Jan 2005 12:43:56 -0500, Alexandre Leclerc [EMAIL PROTECTED] wrote: On Thu, 27 Jan 2005 17:27:40 +0100, Dawid Kuroczko [EMAIL PROTECTED] wrote: On Thu, 27 Jan 2005 10:23:34 -0500, Alexandre Leclerc [EMAIL PROTECTED] wrote: Here a normal listing of design.product_department_time: product_id | department_id | req_time +---+-- 906 | A | 3000 906 | C | 3000 906 | D | 1935 907 | A | 1500 907 | C | 1500 907 | D | 4575 924 | A | 6000 924 | C | 1575 Well, I did something like this recently; it can be done though maybe not very efficiently... Unfortunately we will need a rowtype with all the departaments: CREATE DOMAIN departaments AS (a int, b int, c int, d int, ...); Thank you for this help Dawid, I'll have to take some time to look at this suggestion. If I must create a domain with all the departments I'll have a problem because the user is creating and deleting departments as it pleases him. Any counter-ideas? I have exactly the same problem with my proposal [1] I just wish there would be some native rows-to-columns aggregate. [1]: I was thinking about a trigger on a departaments table, and then recreating the aggregate and view as needed, but it isn't the kind of dynamic I had in mind. ;) Yep, this is the only thing I also tought: a trigger to add / remove columns when the user add or remove a department... but this is not exactly what I wanted (this is not a very nice db design, from my perspective). Thank you for you help. -- Alexandre Leclerc ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Flattening a kind of 'dynamic' table
On Fri, 28 Jan 2005 10:24:37 -0800, Joe Conway [EMAIL PROTECTED] wrote: Alexandre Leclerc wrote: I'm a little bit confused on how to install this contirb. I know my contrib package is installed, but I don't know how to make it work in postgresql. (Using 7.4.5-1mdk on Mandrake Linux.) Find the file tablefunc.sql and redirect it into your database, e.g. psql mydatabase /path/to/contrib/scripts/tablefunc.sql I have no idea where that would be on Mandrake, but you could probably do: locate tablefunc.sql On Fedora Core 1 I find it here: /usr/share/pgsql/contrib/tablefunc.sql Also find and read README.tablefunc. HTH, WHOA! Yess! Exactly the thing! Amazing! :))) Regards, Dawid ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Flattening a kind of 'dynamic' table
Alexandre wrote: Here a normal listing of design.product_department_time: product_id | department_id | req_time +---+-- 906 | A | 3000 906 | C | 3000 906 | D | 1935 907 | A | 1500 907 | C | 1500 907 | D | 4575 924 | A | 6000 924 | C | 1575 product_id | a | c | d +--+--+-- 924 | 6000 | 1575 | 907 | 1500 | 1500 | 4575 906 | 3000 | 3000 | 1935 ok, you have a couple of different options here. The first thing that jumps out at me is to use arrays to cheat using arrays. Let's start with the normalized result set. select product_id, department_id, sum(req_time) group by product_id, department_id product_id | department_id| sum 924 a 6000 924 c 1575 907 a 1500 [...] This should be no slower (in fact faster) then your original query and does not have to be re-coded when you add new departments (you have a department table, right?). If you absolutely must have 1 record/product, you can cheat using arrays: select q.product_id, array_accum(q.department_id) as depts, array_accum(q.req_time) as times from ( select product_id, department_id, sum(req_time) as req_time group by product_id, department_id ) q group by q.product_id; select product_id, array_accum(department_id) sum(req_time) group by product_id product_id | department_id| sum 924 {a, c} {1500, 1575} [...] disclaimer 1: I never checked syntax disclaimer 2: you may have to add array_accum to pg (check docs) Merlin ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Flattening a kind of 'dynamic' table
On Thu, 27 Jan 2005 10:23:34 -0500, Alexandre Leclerc [EMAIL PROTECTED] wrote: Here a normal listing of design.product_department_time: product_id | department_id | req_time +---+-- 906 | A | 3000 906 | C | 3000 906 | D | 1935 907 | A | 1500 907 | C | 1500 907 | D | 4575 924 | A | 6000 924 | C | 1575 Well, I did something like this recently; it can be done though maybe not very efficiently... Unfortunately we will need a rowtype with all the departaments: CREATE DOMAIN departaments AS (a int, b int, c int, d int, ...); A function aggregate for this type: CREATE FUNCTION dep_agg(ds departaments, args text[]) RETURNS departaments AS $$ BEGIN IF args[1] = 'A' THEN ds.a = args[2]; -- I think it is not possible to do ds.$args[1] = args[2] equivalent. ELSIF args[1] = 'B' THEN ds.b = args[2]; ELSIF args[1] = 'C' THEN ds.c = args[2]; ELSIF args[1] = 'D' THEN ds.d = args[2]; END IF; RETURN ds; END; $$ LANUGAGE plpgsql; THEN an aggregate: CREATE AGGREGATE dep_aggregate (basetype = text[], stype = departaments, sfunc =dep_agg); AND then a view for sugar: CREATE VIEW prod_dep_time VIEW AS SELECT product_id, (dep_aggregate(ARRAY[departament_id, req_time]::text[])).* FROM product_department_time GROUP BY product_id; And voila. :) Couple of comments: -- aggregate takes array[] since making multicolumn aggregates is not possible, as far as I know. -- I did not check the code, yet I did manage to make it work some time before. You may need to use ROWS or something in the function definition; I don't remember and can't check it right now. -- comments welcome. :) Regards, Dawid ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Flattening a kind of 'dynamic' table
Unfortunately we will need a rowtype with all the departaments: CREATE DOMAIN departaments AS (a int, b int, c int, d int, ...); I think you mean CREATE TYPE departments... Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Flattening a kind of 'dynamic' table
On Thu, 27 Jan 2005 17:27:40 +0100, Dawid Kuroczko [EMAIL PROTECTED] wrote: On Thu, 27 Jan 2005 10:23:34 -0500, Alexandre Leclerc [EMAIL PROTECTED] wrote: Here a normal listing of design.product_department_time: product_id | department_id | req_time +---+-- 906 | A | 3000 906 | C | 3000 906 | D | 1935 907 | A | 1500 907 | C | 1500 907 | D | 4575 924 | A | 6000 924 | C | 1575 Well, I did something like this recently; it can be done though maybe not very efficiently... Unfortunately we will need a rowtype with all the departaments: CREATE DOMAIN departaments AS (a int, b int, c int, d int, ...); Thank you for this help Dawid, I'll have to take some time to look at this suggestion. If I must create a domain with all the departments I'll have a problem because the user is creating and deleting departments as it pleases him. Any counter-ideas? Regards. -- Alexandre Leclerc ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Flattening a kind of 'dynamic' table
On Thu, 27 Jan 2005 10:44:45 -0500, Merlin Moncure [EMAIL PROTECTED] wrote: Alexandre wrote: Here a normal listing of design.product_department_time: product_id | department_id | req_time +---+-- 906 | A | 3000 906 | C | 3000 906 | D | 1935 907 | A | 1500 907 | C | 1500 907 | D | 4575 924 | A | 6000 924 | C | 1575 product_id | a | c | d +--+--+-- 924 | 6000 | 1575 | 907 | 1500 | 1500 | 4575 906 | 3000 | 3000 | 1935 ok, you have a couple of different options here. The first thing that jumps out at me is to use arrays to cheat using arrays. Let's start with the normalized result set. select product_id, department_id, sum(req_time) group by product_id, department_id product_id | department_id| sum 924 a 6000 924 c 1575 907 a 1500 [...] Hello Merlin, First of all, thanks for your time. Yes this is exactly what I'm doing right now (if I understand well your point here). All records in design.product_department_time are unique for each (product_id, req_time) combo and 0-null values are not possible. This is the first listing you have. In my query I added the sum() and GROUP BY stuff to avoid having such a listing: product_id | a | c | d +--+--+-- 906 | 3000 | | 906 | | 3000 | 906 | | | 1935 907 | 1500 | | 907 | | 1500 | 907 | | | 4575 924 | 6000 | | 924 | | 1575 | So that for a given product_id I have all the times by departments in a single row (second listing I posted). If you absolutely must have 1 record/product, you can cheat using arrays: select q.product_id, array_accum(q.department_id) as depts, array_accum(q.req_time) as times from ( select product_id, department_id, sum(req_time) as req_time group by product_id, department_id ) q group by q.product_id; select product_id, array_accum(department_id) sum(req_time) group by product_id product_id | department_id| sum 924 {a, c} {1500, 1575} [...] I did not used arrays because I didn't think about it, but I don't know if this is still the most efficient way. My software will have to work out the data, unless the array expands in good columns. But I'm not an expert at all. I try to do good DB design, but sometimes this is more complicated to work with the data. Here is the table definition if it can help: design.products (product_id serial PRIMARY KEY, ...); prod.departments (department_id varchar(3) PRIMARY KEY, ...); design.product_department_time ( product_id integer REFERENCES design.products ON DELETE CASCADE ON UPDATE CASCADE, department_id varchar(3) REFERENCES prod.departments ON DELETE CASCADE ON UPDATE CASCADE, req_timeinteger NOT NULL DEFAULT 0 CHECK (req_time = 0), CONSTRAINT product_department_time_pkey PRIMARY KEY (product_id, department_id) ); And i also have a jobs table which has one product_id attached to one job with the required quantity to produce. So I must shouw the user how much time this will require by departments for each jobs. :) This is a nice report, but I don't want to kill the database each time the user want to see it. Thanks for your contrib so far, this will help me looking for other ways doing it. I'm always ready to hear more! Regards. -- Alexandre Leclerc ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Flattening a kind of 'dynamic' table
Alexandre wrote: On Thu, 27 Jan 2005 10:44:45 -0500, Merlin Moncure [EMAIL PROTECTED] wrote: Alexandre wrote: ok, you have a couple of different options here. The first thing that jumps out at me is to use arrays to cheat using arrays. Let's start with the normalized result set. select product_id, department_id, sum(req_time) group by product_id, department_id product_id | department_id| sum 924 a 6000 924 c 1575 907 a 1500 [...] Hello Merlin, First of all, thanks for your time. Yes this is exactly what I'm doing right now (if I understand well your point here). All records in design.product_department_time are unique for each (product_id, req_time) combo and 0-null values are not possible. This is the first listing you have. Right. I expanding departments into columns is basically a dead end. First of all, SQL is not really designed to do this, and second of all (comments continued below) product_id | a | c | d +--+--+-- 906 | 3000 | | 906 | | 3000 | 906 | | | 1935 907 | 1500 | | 907 | | 1500 | 907 | | | 4575 924 | 6000 | | 924 | | 1575 | the above table is more expensive to group than the normalized version above because it is much, much longer. This will get worse and worse as you add more departments. So, whatever you end up doing, I'd advise against expanding rows from a table into columns of a result except for very, very special cases. This is not one of those cases. I did not used arrays because I didn't think about it, but I don't know if this is still the most efficient way. My software will have to work out the data, unless the array expands in good columns. But I'm not an expert at all. I try to do good DB design, but sometimes this is more complicated to work with the data. Arrays are a quick'n'dirty way to de-normalize a result set. According to me, de-normalization is o.k. for result sets *only*. Generally, it is inappropriate to de-normalize any persistent object in the database, such as a view (or especially) a table. de-normalizing sets can sometimes simplify client-side coding issues or provide a performance benefit at the query stage (or slow it down, so be careful!) And i also have a jobs table which has one product_id attached to one job with the required quantity to produce. So I must shouw the user how much time this will require by departments for each jobs. :) This is a nice report, but I don't want to kill the database each time the user want to see it. You always have the option to do this in code. This basically means ordering the result set and writing a nested loop to pass over the data. If you happen to be using a report engine (and it sounds like you are), some engines can simplify this via a grouping criteria, some can't. If parsing an array string is a pain I happen to have a C++ class handy that can compose/decompose a postgresql array string if: a: no more than 1 dimension and b: array bounds are known Let me know if you need it and I'll send it over. Merlin ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Flattening a kind of 'dynamic' table
On Thu, 27 Jan 2005 13:02:48 -0500, Merlin Moncure [EMAIL PROTECTED] wrote: Alexandre wrote: On Thu, 27 Jan 2005 10:44:45 -0500, Merlin Moncure [EMAIL PROTECTED] wrote: Alexandre wrote: Let's start with the normalized result set. product_id | department_id| sum 924 a 6000 924 c 1575 907 a 1500 [...] Right. I expanding departments into columns is basically a dead end. First of all, SQL is not really designed to do this, and second of all (comments continued below) Ok, I got it. The basic message is to avoid making columns out of rows like I'm doing right now, that de-normalizing in an array is the way to go. So I should query and get the results in an array then after my application will parse the array into the good columns. (I'm developping a software.) If I still got it wrong, this is because the 'geek' section of my brain is in vacation: leave a message and when it'll come back, it'll explain all this to me! :) So I found the array_accum function in the doc, so I did create it. CREATE AGGREGATE array_accum ( sfunc = array_append, basetype = anyelement, stype = anyarray, initcond = '{}' ); Then I created this new select: SELECT product_id, array_accum(department_id) as a_department_id, array_accum(req_time) as a_req_time FROM (SELECT * FROM design.product_department_time) AS tmp GROUP BY product_id; It gives: product_id | a_department_id |a_req_time +-+-- 924 | {A,C} | {6000,1575} 907 | {A,C,D} | {1500,1500,4575} 906 | {A,C,D} | {3000,3000,1935} So, the performance should be much better using this agregate approach? No I thing I'll merge the results in my software, unless you think that at this point doing a LEFT JOIN with my jobs table is the way to go, beacuse the performance will be good. (Personally I don't know the answer of this one.) If parsing an array string is a pain I happen to have a C++ class handy that can compose/decompose a postgresql array string if: a: no more than 1 dimension and b: array bounds are known Let me know if you need it and I'll send it over. Thank you for your offer. I think parsing an array is the easiest thing to do for me in all this. :) If I encounter any problem, I'll drop you a mail. Regards. -- Alexandre Leclerc ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Flattening a kind of 'dynamic' table
Alexandre wrote: Right. I expanding departments into columns is basically a dead end. First of all, SQL is not really designed to do this, and second of all (comments continued below) Ok, I got it. The basic message is to avoid making columns out of rows yes. This is wrong. like I'm doing right now, that de-normalizing in an array is the way to go. Only sometimes. Looping application code is another tactic. There may be other things to do as well that don't involve arrays or application code. Consider arrays a (very postgresql specific) tool in your expanding toolchest. De-normalization is a loaded term because we are only presenting queried data in an alternate format (normalization normally applying to data structured within the database). There are many people on this list who will tell you not to de-normalize anything, ever (and most of the time, you shouldn't!). Merlin ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Flattening a kind of 'dynamic' table
On Thu, 27 Jan 2005 16:05:09 -0500, Merlin Moncure [EMAIL PROTECTED] wrote: Alexandre wrote: like I'm doing right now, that de-normalizing in an array is the way to go. Only sometimes. Looping application code is another tactic. There may be other things to do as well that don't involve arrays or application code. Consider arrays a (very postgresql specific) tool in your expanding toolchest. I take good notes of that. All this opened to me other ways for solutions, so I'm glad of that. I'll take more time to think about all that. De-normalization is a loaded term because we are only presenting queried data in an alternate format (normalization normally applying to data structured within the database). There are many people on this list who will tell you not to de-normalize anything, ever (and most of the time, you shouldn't!). Thank you for all you help and time for this. Best regards. -- Alexandre Leclerc ---(end of broadcast)--- TIP 8: explain analyze is your friend