O Markus Schaber έγραψε στις Mar 29, 2006 : > Hello, > > I have a table filled from third party that basically has the following > structure: > > link_id | feat_id | other | columns... > --------+---------+-------+----------- > 1 | 2 | ... > 2 | 5 | ... > 2 | 23 | ... > 3 | 5 | ... > 3 | 23 | some | data > 3 | 23 | other | data > 5 | 23 | ... > 9 | 23 | ... > > This structure is fixed, and we can't change it, but we can create > > We have about 37 million different link_ids, and 35 million feat_ids. > There are feat_ids that appear at several thousand link_ids, but a > link_id does not have more than a douzen feat_ids. > > Now I need to group together all link_ids that have the same set of > feat_ids. In the example above, the sets would be (1), (2,3) and (5,9), > and the other columns would be run through some aggregate functions. > > Currently, this is done via an external JAVA application, but I'm > looking for a way to express this via sql / plpgsql to ease deployment. > > I could imagine some ugly code using ARRAY (not tried yet), but how > would you pack this problem? It seems that I'm just stuck in my thoughts > and miss the beauty way to solve it.
Well i did it with arrays, i dont know the performance implications tho: foodb=# SELECT * from markustest ; link_id | feat_id | other ---------+---------+------- 1 | 2 | 1 2 | 5 | 2 2 | 23 | 2 3 | 5 | 3 3 | 23 | 3 3 | 23 | 3 5 | 23 | 5 9 | 23 | 9 (8 rows) foodb=# SELECT qoo.foo2,sum(qoo.foo3) from (SELECT mt.link_id as foo,_int_union(array(select mt2.feat_id from markustest mt2 where mt2.link_id=mt.link_id order by mt2.feat_id),'{}') as foo2,other::int4 as foo3 from markustest mt) as qoo GROUP BY qoo.foo2; foo2 | sum --------+----- {2} | 1 {5,23} | 13 {23} | 14 (3 rows) foodb=# The _int_union trick is to force the arrays to have unique values. The order by has the meaning that '{5,23}' and '{23,5}' should be treated the same way. > > Thanks, > Markus > -- -Achilleus ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly