[SQL] determining how many products are in how many categories
# product table (simplified): create table p ( id char(22) not null primary key, name text, desc text ); # product category table (simpl.): create table pc ( id char(22) not null primary key, name text, desc text ); # table that maps products into categories: create table p_pc ( id char(22) not null primary key, pid char(22) not null references p(id), pcid char(22) not null references pc(id) ); create index i_ppc_pid on p_pc(pid); create index i_ppc_pcid on p_pc(pcid); create unique index i_ppc_pid_pcid on p_pc(pid, pcid); There are about 50k rows in p, 40k in pc, and 140k in p_pc. Most products are only placed in 1 category, some in 2, fewer in 3, and fewer stills in 4, 5, 6 categories. I want to know how many products are placed in 1 category, how many in 2, and so on. Here's my query: select count(pid),num_cat from ( select pid,count(cid) as num_cat from ppc group by pid) as f group by num_cat; A sample output (4000 products are categorized in 5 different places, 4998 in 4, and so on): count | num_cat ---+- 4000 | 5 4998 | 4 7502 | 3 10001 | 2 17499 | 1 (5 rows) However, this query is taking about 2.5 minutes. Any idea on how to make it faster? -- dave ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] surrogate key or not?
Josh Berkus wrote:
> Given: Surrogate keys, by definition, represent no real data;
> Given: Only items which represent real data have any place in
> a data model
> Conclusion: Surrogate keys have no place in the data model
But, once a surrogate key is assigned to a row, doesn't it become a
"real" data? For example, I have a bunch of invoices/receipts and I
write down a unique number on each of them. Doesn't the unique number
become part of the information contained by the invoice/receipt itself
(at least as long as I'm concerned)? Another example is when Adam
(+Eve?) named each animal in the Genesis. At that time the name he gave
for each animal was arbitrary ("surrogate"), but once assigned to each
species, it becomes part of the real data.
> 3) Mutability: Most RDBMSs are very inefficient about CASCADE deletes and
Change management IMO is perhaps the main reason of surrogate/artificial
key. We often need a PK that _never_ needs to change (because it can be
a royal PITA or downright impossibility to make this change; the PK
might already be printed on a form/card/document, recorded on some
permanent database, tattoed/embedded in someone's forehead, etc).
Meanwhile, every other aspect of the data can change (e.g. a person can
change his name, sex, age, email, address, even date & place of birth).
Not to mention data entry mistakes. So it's impossible to use any
"real"/natural key in this case.
> Now, you're probably wondering "why does this guy regard surrogate keys as a
> problem?" I'll tell you: I absolutely cannot count the number of "bad
> databases" I've encountered which contained tables with a surrogate key, and
> NO REAL KEY of any kind. This makes data normalization impossible, and
> cleanup of the database becomes a labor-intensive process requiring
> hand-examination of each row.
Okay, so surrogate key makes it easy for stupid people to design a
database that is prone to data duplication (because he doesn't install
enough unique constraints to prevent this). But I don't see how a
relation with a surrogate key is harder to "normalize" (that is, for the
duplicates to be removed) than a relation with no key at all. Compare:
street
--
Green Street
Green Street
Green Street
versus:
idstreet
----
2934 Green Street
4555 Green Street
5708 Green Street
They both contain duplicates and/or ambiguous data. They're both equally
hard to normalize/clean.
--
dave
---(end of broadcast)---
TIP 8: explain analyze is your friend
[SQL] Displaying two tables side by side
How can you display two tables side by side? Example: > select * from t1; a | b ---+--- 2 | 2 3 | 5 4 | 7 9 | 0 > select * from t2; c | d ---+--- 4 | 5 7 | 3 3 | 2 1 | 1 2 | 0 Intended output: a | b | c | d ---+---+---+--- 2 | 2 | 4 | 5 3 | 5 | 7 | 3 4 | 7 | 3 | 2 9 | 0 | 1 | 1 | | 2 | 0 Each table has no keys (and no OIDs). Order is not important, but each row from each table needs to be displayed exactly once. -- dave ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] Displaying two tables side by side
Andreas Haumer wrote: You could try to use PosgreSQL's ctid system column to join on like this: test=# select *,ctid from t1; a | b | ctid - ---+---+--- 2 | 2 | (0,1) 3 | 5 | (0,2) 4 | 7 | (0,3) 9 | 0 | (0,4) test=# select *,ctid from t2; c | d | ctid - ---+---+--- 4 | 5 | (0,1) 7 | 3 | (0,2) 3 | 2 | (0,3) 1 | 1 | (0,4) 2 | 0 | (0,5) test=# select * from t1 right outer join t2 on (t1.ctid=t2.ctid); a | b | c | d - ---+---+---+--- 2 | 2 | 4 | 5 3 | 5 | 7 | 3 4 | 7 | 3 | 2 9 | 0 | 1 | 1 | | 2 | 0 Note that this is of course very platform specific. On Oracle you could use rownum, for example. I don't have a more portable solution on hand right now. HTH Thanks, I just found out about ctid. I was thinking on a rownum equivalent too, actually. I guess a more portable solution would be creating a temporary table for each table to add the ctid/"row counter" equivalent, and then join on that. -- dave ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[SQL] olympics ranking query
See http://www.athens2004.com/en/OlympicMedals/medals?noc=MGL .
create table countrymedal (
countryid CHAR(3) PRIMARY KEY,
gold INT NOT NULL,
silver INT NOT NULL,
bronze INT NOT NULL);
COPY countrymedal (countryid, gold, silver, bronze) FROM stdin;
ITA 5 6 3
FRA 5 3 5
UKR 5 1 1
RUS 4 8 10
GER 4 4 7
TUR 3 0 1
KOR 2 7 3
NED 2 5 4
HUN 2 3 1
SVK 2 2 1
ROM 2 0 2
GRE 2 0 1
POL 1 2 1
BLR 1 1 2
SUI 1 0 1
UAE 1 0 0
GBR 0 4 4
AUT 0 3 0
PRK 0 2 1
ESP 0 2 0
CUB 0 1 5
CZE 0 1 2
ZIM 0 1 1
USA 15 11 10
CHN 15 9 8
JPN 9 4 2
AUS 7 5 8
GEO 1 1 0
RSA 1 1 0
BUL 1 0 2
THA 1 0 2
IND 0 1 0
INA 0 1 0
KAZ 0 1 0
POR 0 1 0
SCG 0 1 0
AZE 0 0 2
BEL 0 0 2
BRA 0 0 2
DEN 0 0 2
ARG 0 0 1
CAN 0 0 1
COL 0 0 1
CRO 0 0 1
ISR 0 0 1
MGL 0 0 1
SLO 0 0 1
TRI 0 0 1
\.
create sequence seq1;
create sequence seq2;
-- query #1: list of ranks by gold
select setval('seq1', 1);
select setval('seq2', 1);
select
setval('seq1', currval('seq1')+setval('seq2',count(*)))-count(*) as rank,
count(*) as numranker,
gold, silver, bronze
from countrymedal
group by gold, silver, bronze
order by gold desc, silver desc, bronze desc;
-- result of query #1
rank | numranker | gold | silver | bronze
--+---+--++
1 | 1 | 15 | 11 | 10
2 | 1 | 15 | 9 | 8
3 | 1 |9 | 4 | 2
4 | 1 |7 | 5 | 8
5 | 1 |5 | 6 | 3
6 | 1 |5 | 3 | 5
7 | 1 |5 | 1 | 1
8 | 1 |4 | 8 | 10
9 | 1 |4 | 4 | 7
10 | 1 |3 | 0 | 1
11 | 1 |2 | 7 | 3
12 | 1 |2 | 5 | 4
13 | 1 |2 | 3 | 1
14 | 1 |2 | 2 | 1
15 | 1 |2 | 0 | 2
16 | 1 |2 | 0 | 1
17 | 1 |1 | 2 | 1
18 | 1 |1 | 1 | 2
19 | 2 |1 | 1 | 0
21 | 2 |1 | 0 | 2
23 | 1 |1 | 0 | 1
24 | 1 |1 | 0 | 0
25 | 1 |0 | 4 | 4
26 | 1 |0 | 3 | 0
27 | 1 |0 | 2 | 1
28 | 1 |0 | 2 | 0
29 | 1 |0 | 1 | 5
30 | 1 |0 | 1 | 2
31 | 1 |0 | 1 | 1
32 | 5 |0 | 1 | 0
37 | 4 |0 | 0 | 2
41 | 8 |0 | 0 | 1
(32 rows)
-- query #2: list of countries ordered by their ranks
select setval('seq1', 1);
select setval('seq2', 1);
select
(case numranker when 1 then '' else '=' end) || rank as rank,
countryid,
cm.gold, cm.silver, cm.bronze
from countrymedal cm
left join
(select
setval('seq1',
currval('seq1')+setval('seq2',count(*))
)-count(*) as rank,
count(*) as numranker,
gold, silver, bronze
from countrymedal
group by gold, silver, bronze
order by gold desc, silver desc, bronze desc
) t1 on cm.gold=t1.gold and cm.silver=t1.silver and cm.bronze=t1.bronze
order by t1.rank;
-- result of query #2
rank | countryid | gold | silver | bronze
--+---+--++
1| USA | 15 | 11 | 10
2| CHN | 15 | 9 | 8
3| JPN |9 | 4 | 2
4| AUS |7 | 5 | 8
5| ITA |5 | 6 | 3
6| FRA |5 | 3 | 5
7| UKR |5 | 1 | 1
8| RUS |4 | 8 | 10
9| GER |4 | 4 | 7
10 | TUR |3 | 0 | 1
11 | KOR |2 | 7 | 3
12 | NED |2 | 5 | 4
13 | HUN |2 | 3 | 1
14 | SVK |2 | 2 | 1
15 | ROM |2 | 0 | 2
16 | GRE |2 | 0 | 1
17 | POL |1 | 2 | 1
18 | BLR |1 | 1 | 2
=19 | GEO |1 | 1 | 0
=19 | RSA |1 | 1 | 0
=21 | BUL |1 | 0 | 2
=21 | THA |1 |
Re: [SQL] olympics ranking query
Tom Lane wrote:
Challenge question: is there a simpler way to do query #1 (without any
PL, and if possible without sequences too?
Can't without sequences AFAIK, but you certainly can do it simpler:
select setval('seq1', 0);
select nextval('seq1'), * from
(select count(*) as numranker,
gold, silver, bronze
from countrymedal
group by gold, silver, bronze
order by gold desc, silver desc, bronze desc) ss;
This is not quite the same. The ranks are sequential, but they skip, so
as to match the number of participating countries.
--
dave
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] olympics ranking query
Bruno Wolff III wrote: On Fri, Aug 20, 2004 at 23:40:08 +0700, David Garamond <[EMAIL PROTECTED]> wrote: Challenge question: is there a simpler way to do query #1 (without any PL, and if possible without sequences too? You could use a subselect to count how many countries had a lower medal ranking and add 1 to get the rank. This should be a lot more standard than using sequences. It will probably be a little slower, but for tables of that size it shouldn't be a big deal. Thanks for the tip. This is what I came up with: select (select count(*) from countrymedal c1 where c1.gold>c2.gold or (c1.gold=c2.gold and (c1.silver>c2.silver or (c1.silver=c2.silver and c1.bronze>c2.bronze+1 as rank, count(*) as numranker, gold, silver, bronze from countrymedal c2 group by gold, silver, bronze order by rank; -- dave ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[SQL] Interpolation and extrapolation in SQL
On my first try, interpolation and extrapolation turns out to be pretty easy to do. In psql: -- the "lookup" table CREATE TABLE p ( x DOUBLE PRECISION NOT NULL UNIQUE, y DOUBLE PRECISION NOT NULL ); INSERT INTO p VALUES (1,1); INSERT INTO p VALUES (2,5); INSERT INTO p VALUES (5,14); INSERT INTO p VALUES (10,21); -- the table that contains our x values, the y values of which will be -- looked up in the lookup table. CREATE TABLE q ( x DOUBLE PRECISION NOT NULL ); INSERT INTO q VALUES (0); INSERT INTO q VALUES (1); INSERT INTO q VALUES (2); INSERT INTO q VALUES (3.5); INSERT INTO q VALUES (5.5); INSERT INTO q VALUES (10); INSERT INTO q VALUES (11); -- query A. only handles interpolation \set x1 '(SELECT p.x FROM p WHERE p.x <= q.x ORDER BY p.x DESC LIMIT 1)' \set x2 '(SELECT p.x FROM p WHERE p.x >= q.x ORDER BY p.x ASC LIMIT 1)' \set y1 '(SELECT p.y FROM p WHERE p.x <= q.x ORDER BY p.x DESC LIMIT 1)' \set y2 '(SELECT p.y FROM p WHERE p.x >= q.x ORDER BY p.x ASC LIMIT 1)' SELECT q.x, CASE WHEN :x1 = :x2 THEN :y1 ELSE(:y1 + (q.x-:x1)/(:x2-:x1)*(:y2-:y1)) END AS y FROM q; -- query B. also handles extrapolation, and the note column tells us -- whether a certain y value is directly taken from p, or interpolated, -- or extrapolated. \set x0 '(SELECT p.x FROM p WHERE p.x <= q.x ORDER BY p.x DESC LIMIT 1 OFFSET 1)' \set x3 '(SELECT p.x FROM p WHERE p.x >= q.x ORDER BY p.x ASC LIMIT 1 OFFSET 1)' \set y0 '(SELECT p.y FROM p WHERE p.x <= q.x ORDER BY p.x DESC LIMIT 1 OFFSET 1)' \set y3 '(SELECT p.y FROM p WHERE p.x >= q.x ORDER BY p.x ASC LIMIT 1 OFFSET 1)' SELECT q.x, CASE WHEN :x1 = :x2 THEN :y1 WHEN :x1 IS NULL THEN (:y2 + (q.x-:x2)/(:x3-:x2)*(:y3-:y2)) WHEN :x2 IS NULL THEN (:y0 + (q.x-:x0)/(:x1-:x0)*(:y1-:y0)) ELSE (:y1 + (q.x-:x1)/(:x2-:x1)*(:y2-:y1)) END AS y, CASE WHEN :x1 = :x2 THEN 'direct' WHEN :x1 IS NULL THEN 'extrapolated to the left' WHEN :x2 IS NULL THEN 'extrapolated to the right' ELSE 'interpolated' END AS note FROM q; -- C. to handle the case where x in p is not unique, replace the x0..x3 and y0..y3 template into: \set x1 '(SELECT p2.x FROM (SELECT DISTINCT x FROM p) p2 WHERE p2.x <= q.x ORDER BY p2.x DESC LIMIT 1)' \set x2 '(SELECT p2.x FROM (SELECT DISTINCT x FROM p) p2 WHERE p2.x >= q.x ORDER BY p2.x ASC LIMIT 1)' \set y1 '(SELECT p.y FROM (SELECT DISTINCT x FROM p) p2 LEFT JOIN p ON p.x=p2.x WHERE p2.x <= q.x ORDER BY p2.x DESC LIMIT 1)' \set y2 '(SELECT p.y FROM (SELECT DISTINCT x FROM p) p2 LEFT JOIN p ON p.x=p2.x WHERE p2.x >= q.x ORDER BY p2.x ASC LIMIT 1)' \set x0 '(SELECT p2.x FROM (SELECT DISTINCT x FROM p) p2 WHERE p2.x <= q.x ORDER BY p2.x DESC LIMIT 1 OFFSET 1)' \set x3 '(SELECT p2.x FROM (SELECT DISTINCT x FROM p) p2 WHERE p2.x >= q.x ORDER BY p2.x ASC LIMIT 1 OFFSET 1)' \set y0 '(SELECT p.y FROM (SELECT DISTINCT x FROM p) p2 LEFT JOIN p ON p.x=p2.x WHERE p2.x <= q.x ORDER BY p2.x DESC LIMIT 1 OFFSET 1)' \set y3 '(SELECT p.y FROM (SELECT DISTINCT x FROM p) p2 LEFT JOIN p ON p.x=p2.x WHERE p2.x >= q.x ORDER BY p2.x ASC LIMIT 1 OFFSET 1)' Questions: 1) Is the above correct? (It gives me correct result, but I'm not 100% sure) 2) Does the optimizer cache the result of identical subqueries (e.g. :x1 or :x2, which is mentioned several times in the query)? If yes, how do I know this? 3) Is there another (simpler, more elegant, more efficient) way to do interpolation/extrapolation in SQL? -- dave ---(end of broadcast)--- TIP 8: explain analyze is your friend
[SQL] order and order line item
Suppose we have the usual order & order line item entities: CREATE TABLE "order" ( id INT PRIMARY KEY, date DATE NOT NULL ); CREATE TABLE orderlineitem ( id INT PRIMARY KEY, orderid INT REFERENCES "order"(id), seq INT NOT NULL, CONSTRAINT con1 UNIQUE (orderid, seq), productid INT NOT NULL, quantity INT NOT NULL, price NUMERIC(18,4) NOT NULL ); how do we enforce things like, for example: a) each order must contain at least 1 line item; or b) each order must have a non-zero total value; without making INSERT painful? I'm new to triggers, so I'm thinking of something like a "before COMMIT" trigger but there's no such thing apparently. Regards, dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[SQL] constraint and ordered value
Is it possible to use only CHECK constraint (and not triggers) to completely enforce ordered value of a column (colx) in a table? By that I mean: 1. Rows must be inserted in the order of colx=1, then colx=2, 3, and so on; 2. When deleting (or updating), "holes" must not be formed, e.g. if there are three rows then row with colx=3 must be the first one deleted, and then colx=2 the second, and so on. I can see #1 being accomplished using a NOT NULL + UNIQUE constraint and a CHECK constraint that calls some PL function where the function does a simple checking (new.colx=1 if COUNT(colx)==0, or new.colx=MAX(colx)+1). But is it possible to do #2 using only constraints? Regards, dave ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] constraint and ordered value
Daryl Richter wrote: >> No. A constraint only applies to one row at a time. If you try to >> work around >> this by calling a function that does queries it isn't guarenteed to >> work. >> And if you are thinking of calling a function that does a query, you >> aren't >> looking at saving time over using triggers. >> >> Also, if you are going to have concurrent updates, you are going to >> need to >> do table locking to make this work. > > And, finally, you should ask yourself *why* are you doing this, given > that one of the fundamental properties of a table (relation) is that > the rows (tuples) are *unordered.* So much of what makes a relational > db a wonderful thing for storing data depends on this notion. > > If you provide an explanation of what you are trying to model, perhaps > we can help you find a better schema design. Thanks for the insightful answers. Actually I'm just learning about and trying out CHECK constraints in Postgres (and Firebird) :-) Regards, dave ---(end of broadcast)--- TIP 6: explain analyze is your friend
[SQL] autoupdating mtime column
Dear all,Please see SQL below. I'm already satisfied with everything except I wish in #4, mtime got automatically updated to NOW() if not explicitly SET in UPDATE statement. Is there a way to make the mtime column behave more like I wanted? Thanks in advance. create table t1 ( id int primary key, t text, ctime timestamp with time zone, mtime timestamp with time zone);create or replace function update_times() returns trigger as $$begin if tg_op='INSERT' then if NEW.ctime is null then NEW.ctime = NOW(); end if; if NEW.mtime is null then NEW.mtime = NOW(); end if; elsif tg_op='UPDATE' then if NEW.ctime is null then NEW.ctime = OLD.ctime; end if; if NEW.mtime is null then NEW.mtime = NOW(); end if; end if; return NEW;end;$$ language plpgsql;create trigger update_times before update or insert on t1 for each row execute procedure trig1(); -- #1. mtime & ctime set to NOW()insert into t1 values (1,'text1',null,null);-- #2. mtime & ctime set to '2001-01-01'insert into t1 values (2,'text2','2000-01-01','2000-01-01');-- #3. mtime and ctime set to '2006-06-06' update t1 set t='new text1',ctime='2006-6-6',mtime='2006-6-6' where id=1;-- #4. mtime and ctime unchangedupdate t1 set t='new text1' where id=1;-- #5. mtime automatically updated to NOW()update t1 set t='new text1',mtime=null where id=1;
Re: [SQL] autoupdating mtime column
On 8/4/06, Richard Huxton <[email protected]> wrote: David Garamond wrote:> Dear all,>> Please see SQL below. I'm already satisfied with everything except I> wish in> #4, mtime got automatically updated to NOW() if not explicitly SET in > UPDATE> statement. Is there a way to make the mtime column behave more like I> wanted? Thanks in advance.Just check for OLD.mtime = NEW.mtime, or am I missing something here? How do I differentiate between:UPDATE t SET mtime=mtime ...;in which mtime is specifically set and should not change, andUPDATE t SET foo=bar ...;in which mtime is not mentioned and should be updated automagically to NOW(). --dave
Re: [SQL] autoupdating mtime column
On 8/4/06, Rodrigo De León <[EMAIL PROTECTED]> wrote: How about:create or replace functionupdate_times()returns trigger as $$beginif TG_OP='INSERT' thennew.ctime = coalesce(new.ctime,now()); new.mtime = coalesce(new.mtime,now());elsif TG_OP='UPDATE' thennew.ctime = old.ctime;new.mtime = now();end if;return new; end;$$ language plpgsql;But that would disallow setting mtime and ctime to arbitrary values, which I want to permit. But ctime and mtime should be set to current time if not specified in INSERT, and mtime to current time if not specified in UPDATE. I guess what I want is the MySQL's TIMESTAMP autoupdate behaviour, whichI can't seem to be able to emulate in Postgres, because there's no information given about which columns are specified in the SET clause, only the NEW and OLD records. --dave
Re: [SQL] autoupdating mtime column
On 8/4/06, Tom Lane <[EMAIL PROTECTED]> wrote: If you are really intent on having a way to suppress the mtime updateyou could dedicate an additional field to the purpose, egUPDATE t SET foo=..., bar=..., keepmtime = true ...and in the trigger something like if new.keepmtime thennew.keepmtime = false;elsenew.mtime = now();As long as nothing else ever touches keepmtime this would work.Personally I'm dubious that it's worth the trouble Yeah, it's too expensive an overhead just for the sake of a slightly shorter UPDATE statement. --- do youhave a real use-case for suppressing mtime updates?Syncing tables between databases (a la "rsync --times"). Btw, I'm considering temporarily disabling the update_times() trigger when sync-ing. Thanks,--dave
Re: [SQL] autoupdating mtime column
On 8/5/06, Richard Huxtonwrote: >> have a real use-case for suppressing mtime updates?> Syncing tables between databases (a la "rsync --times"). Btw, I'm> considering temporarily disabling the update_times() trigger when sync-ing. I'd consider running the sync as a different (privileged) user andchecking the current user in the trigger. Ordinary users always getnow(), privileged users always get what they provide (and they areforced to provide some value). Does what you want and adds a safety catch too.Alternatively, you could do something similar with views.That's a nice idea indeed. Thanks!--dave
[SQL] query: last N price for each product?
Dear SQL masters, The query for "latest price for each product" goes like this (which I can grasp quite easily): SELECT * FROM price p1 WHERE ctime=(SELECT MAX(ctime) FROM price p2 WHERE p1.product_id=p2.product_id) or: SELECT * FROM price p1 WHERE NOT EXISTS (SELECT * FROM price p2 WHERE p1.product_id=p2.product_id AND p1.ctime < p2.ctime) but how do yo query for "latest 3 prices for each product"? The "price" table contains current price as well as price histories. Regards, dave -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] grouping/clustering query
Dear all, I have an invoices (inv) table and bank transaction (tx) table. There's also the payment table which is a many-to-many relation between the former two tables, because each invoice can be paid by one or more bank transactions, and each bank transaction can pay for one or more invoices. Example: # (invoiceid, txid) (A, 1) (A, 3) (B, 1) (B, 2) (C, 5) (D, 6) (D, 7) (E, 8) (F, 8) For journalling, I need to group/cluster this together. Is there a SQL query that can generate this output: # (journal: invoiceids, txids) [A,B] , [1,2,3] [C], [5] [D], [6,7] [E,F], [8] Regards, dave -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] grouping/clustering query
Tony, Joe, Steve,
Thanks for the follow-ups. Yes, the problem is related to double-entry
accounting, where one needs to balance total debit and credit
(payments and invoices) in each journal/transaction.
Due to time constraint, I ended up doing this in the client-side
programming language, since I am nowhere near fluent in PLs. The
algorithm should be simple (at least the "brute force" version), it's
basically checking if each element of the pair (txid, invoiceid) is
already mentioned in some journal and if it is, add the pair to the
journal, otherwise create a new journal with that pair as the first
entry. I believe this can easily be implemented in a PL. But still I
wonder if there is some SQL incantation that can do the same without
any PL.
Regards,
dave
On Sat, Oct 25, 2008 at 3:48 AM, Tony Wasson <[EMAIL PROTECTED]> wrote:
> On Fri, Oct 24, 2008 at 10:24 AM, Tony Wasson <[EMAIL PROTECTED]> wrote:
>> On Fri, Oct 24, 2008 at 8:04 AM, Steve Midgley <[EMAIL PROTECTED]> wrote:
>>> At 11:28 AM 10/23/2008, Joe wrote:
Steve Midgley wrote:
>>
>> # (invoiceid, txid)
>> (A, 1)
>> (A, 3)
>> (B, 1)
>> (B, 2)
>> (C, 5)
>> (D, 6)
>> (D, 7)
>> (E, 8)
>> (F, 8)
>>
>> For journalling, I need to group/cluster this together. Is there a SQL
>> query that can generate this output:
>>
>> # (journal: invoiceids, txids)
>> [A,B] , [1,2,3]
>> [C], [5]
>> [D], [6,7]
>> [E,F], [8]
>
> Hi Dave,
>
> I'm not following the logic here. A has 1,3 and B has 1,2. So why does
> the first line print:
>
>> [A,B] , [1,2,3]
>
> What's the rule that tells the query to output this way? Is it that all
> of B's values are between A's values?
From a purely accounting standpoint, since transaction 1 was applied to
both invoices A and B, you need to group the invoices so that you can
compare total invoiced against total paid.
>>>
>>> I tinkered around briefly but didn't come up with a good idea, but I bet
>>> someone on this list can. However, I did create a CREATE script for your
>>> table design which, in my experience, makes it more likely that a real
>>> expert will take on your problem..
>>>
>>> Hope this helps,
>>>
>>> Steve
>>>
>>> DROP TABLE IF EXISTS trans;
>>>
>>> CREATE TABLE trans
>>> (
>>> id serial NOT NULL,
>>> inv_id character varying,
>>> tx_id character varying,
>>> CONSTRAINT pk_id PRIMARY KEY (id)
>>> )
>>> WITH (OIDS=FALSE);
>>>
>>> insert into trans (inv_id, tx_id) values('A','1');
>>> insert into trans (inv_id, tx_id) values('A','3');
>>> insert into trans (inv_id, tx_id) values('B','1');
>>> insert into trans (inv_id, tx_id) values('B','2');
>>> insert into trans (inv_id, tx_id) values('C','5');
>>> insert into trans (inv_id, tx_id) values('D','6');
>>> insert into trans (inv_id, tx_id) values('D','7');
>>> insert into trans (inv_id, tx_id) values('E','8');
>>> insert into trans (inv_id, tx_id) values('F','8');
>
>
> Here's a stab at a custom aggregate attempting to explain what's going
> on inside.
>
> CREATE OR REPLACE FUNCTION varchar_array_accum_unique (x VARCHAR[], y
> VARCHAR) RETURNS VARCHAR[] AS $$
> DECLARE
> res VARCHAR[];
> BEGIN
> RAISE NOTICE 'input state is %',x;
> RAISE NOTICE 'input variable is %',y;
> IF x = '{}' THEN
>RAISE NOTICE 'x is empty, returning input variable %',y;
>res[1] := y;
> ELSE
>RAISE NOTICE 'input array is not empty, checking if input
> variable is a member %',y;
>res := x;
>IF y = ANY(res) THEN
>RAISE NOTICE 'y is already in array %, skipping',res;
>ELSE
>res := array_append(res, y);
>RAISE NOTICE 'appending input variable %',y;
>END IF;
> END IF;
> RETURN res;
> END
> $$ LANGUAGE plpgsql STRICT;
>
> DROP AGGREGATE array_accum_unique(VARCHAR);
> CREATE AGGREGATE array_accum_unique
> (
>basetype = VARCHAR
>, sfunc = varchar_array_accum_unique
>, stype = VARCHAR[]
>, initcond = '{}'
> );
>
> SELECT array_accum_unique(inv_id) AS invoiceids
> , array_accum_unique(tx_id) AS transactionids
> FROM (
>SELECT tx_id, inv_id
>FROM trans
>WHERE inv_id IN (
>SELECT inv_id
>FROM trans
>WHERE id IN (
>SELECT id FROM trans
> WHERE tx_id=1
>)
>)
>ORDER BY tx_id, inv_id
> ) AS ss
> ;
>
>
> Returns this result for transaction id 1.
>
> invoiceids | transactionids
> +
> {A,B} | {1,2,3}
> (1 row)
>
> Hope this helps!
> Tony
>
--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
[SQL] Array: comparing first N elements?
I have a "materialized path" tree table like this (simplified): CREATE TABLE product ( id SERIAL PRIMARY KEY, parents INT[] NOT NULL, name TEXT NOT NULL, UNIQUE (parents, name) ); CREATE INDEX name ON product(name); Previously I use TEXT column for parents, but arrays look interesting and convenient so I'm considering migrating to arrays. However, how do I rewrite this using arrays? SELECT * FROM product WHERE parents LIKE '0001/0010/%'; In other words, testing against the first N elements in an array. Regards, Dave
Re: [SQL] Array: comparing first N elements?
On Tue, May 12, 2009 at 3:28 PM, Pavel Stehule wrote: > Hello > > create or replace function eqn(anyarray, anyarray, int) > returns boolean as $$ > select not exists(select $1[i] from generate_series(1,$3) g(i) > except > select $2[i] from generate_series(1,$3) g(i)) > $$ language sql immutable strict; > > postgres=# select eqn(array[1,2,3,4,5], array[1,2,3,5,6], 3); > eqn > - > t > (1 row) > > Time: 1,590 ms > postgres=# select eqn(array[1,2,3,4,5], array[1,2,3,5,6], 4); > eqn > - > f > (1 row) > Hi Pavel, Thanks for the solution, but that's too slow. I'd rather just do this instead: select * from product where parents[1:(select array_length(parents,1) from product where name='wanted')+1]= (select parents from product where name='wanted')|| (select id from product where name='wanted'); but the above query is also unable to use any indices (unlike LIKE 'foo%'). Regards, Dave
Re: [SQL] Array: comparing first N elements?
2009/5/12 Achilleas Mantzios > you would want to look at the intarray contrib package for index suppor and > many other goodies, > also you might want to write fucntions first(parents), last(parents) and > then have an index > on those as well. > This way searching for the direct children of a node is very fast. Thanks for the suggestions! Index support is exactly what I'm looking for. Will look into intarray. Regards, dave
