Greetings, I have 3 tables. Two of which (ta and tb) are different "attributes" the third table (tc) can have.
tc is allowed up to three of each kind of "attributes". e.g., 3 ta values and 2 tb values. By assigning ta and tb attributes to each entry in tc you are also "matching" tb attributes with corresponding ta ones. e.g., c_id = 8 ta1 = 1 <-- match --> tb1 = 3 ta2 = 3 <-- match --> tb2 = 2 ta3 = 13 <-- match --> tb3 = 20 c_id = 9 ta1 = 2 <-- match --> tb1 = 3 ta2 = 5 <-- match --> tb2 = 3 ta3 = 9 <-- match --> tb3 = 2 c_id = 10 ta1 = 3 <-- match --> tb1 = 2 ta2 = 1 <-- match --> tb2 = 3 ta3 = 9 <-- match --> tb3 = 2 The task at hand is to figure out the number of ta attributes each tb attribute corresponds with. For above example I want a result set showing me: tb.b_id ta.a_id count -------------------------- 2 2 -- matched w/ta.a_id = 3, 9 3 3 -- matched w/ta.a_id = 1, 2, 5 20 1 -- only matched w/ta.a_id = 13 -- tables create table ta ( a_id int unique ); create table tb ( b_id int unique ); create table tc ( c_id int unique, ta1 int references ta( a_id ), ta2 int references ta( a_id ), ta3 int references ta( a_id ), tb1 int references tb( b_id ), tb2 int references tb( b_id ), tb3 int references tb( b_id ) ); -- sequences create sequence ta_seq; create sequence tb_seq; create sequence tc_seq; -- populate tables with some data insert into ta values ( NEXTVAL( 'ta_seq' ) ); insert into ta values ( NEXTVAL( 'ta_seq' ) ); insert into ta values ( NEXTVAL( 'ta_seq' ) ); insert into ta values ( NEXTVAL( 'ta_seq' ) ); insert into ta values ( NEXTVAL( 'ta_seq' ) ); insert into ta values ( NEXTVAL( 'ta_seq' ) ); insert into ta values ( NEXTVAL( 'ta_seq' ) ); insert into ta values ( NEXTVAL( 'ta_seq' ) ); insert into ta values ( NEXTVAL( 'ta_seq' ) ); insert into ta values ( NEXTVAL( 'ta_seq' ) ); insert into ta values ( NEXTVAL( 'ta_seq' ) ); insert into tb values ( NEXTVAL( 'tb_seq' ) ); insert into tb values ( NEXTVAL( 'tb_seq' ) ); insert into tb values ( NEXTVAL( 'tb_seq' ) ); insert into tb values ( NEXTVAL( 'tb_seq' ) ); -- ... insert into tb values ( NEXTVAL( 'tb_seq' ) ); insert into tb values ( NEXTVAL( 'tb_seq' ) ); Don't read too much into the data as i don't think the values chosen have anything to do with the problem. I am just using a set of data sequences from my live database. Though, i'm willing to be surprised if someone's willing to show correlation between the problem and the values used. -- ta tb -- 1 2 3 1 2 3 insert into tc values ( NEXTVAL( 'tc_seq' ), 1, 1, 1, 1, 3, 2 ); insert into tc values ( NEXTVAL( 'tc_seq' ), 3, 3, 3, 2, 1, 3 ); insert into tc values ( NEXTVAL( 'tc_seq' ), 4, 4, 4, 2, 1, 3 ); insert into tc values ( NEXTVAL( 'tc_seq' ), 5, 5, 5, 1, 2, 3 ); insert into tc values ( NEXTVAL( 'tc_seq' ), 5, 5, 5, 1, 2, 3 ); insert into tc values ( NEXTVAL( 'tc_seq' ), 1, 1, 1, 1, 2, 3 ); insert into tc values ( NEXTVAL( 'tc_seq' ), 4, 4, 4, 2, 1, 3 ); insert into tc values ( NEXTVAL( 'tc_seq' ), 3, 3, 3, 2, 1, 3 ); /* insert into tc values ( NEXTVAL( 'tc_seq' ), 1, null, null, 1, null, null ); */ This last insert is to show it is possible to have null values for either of the ta or tb attributes in table tc. But it is omitted to simplify the problem for now. I am no SQL expert, as in I don't do it for a living. Therefore, I would not doubt that there exist more efficient ways to do this but this is one way I came up with and it is quite possible that a SQL mistake is causing the strange outputs I am observing. But let me describe in English what i am attempting to do in the SQL. As I said earlier the task is to determine the unique count of ta entries each tb.b_id is associated with in the tc table. I'm selecting on the tb table and doing sub selects on the tc table where I attempt to count unique counts of tb to ta matches. First sub-select (aa) does a distinct count of tc.ta1 where tc.tb1 equals tb.b_id and the corresponding tc.ta1 field is not null. Second sub-select (bb) I'm again doing a count of distinct tc.ta2 where tc.tb2 is equal to tb.b_id, tc.ta2 is not null and tc.ta2 was not counted in the aa sub-select. Next (cc), once again do a count of distinct tc.ta3 where tc.tb3 is equal to tb.b_id, tc.ta3 is not null and not in either of the two previous sub-selects (aa, bb). The sum of sub-selects aa, bb and cc should be the value I'm interested in (not shown in SQL statement). The problem: The result of the select is not the expected counts of ta to tb. The twist: If the select statement is provided a where-clause where each tb.b_id is restricted to individual tb.b_id values then the correct/expected count of ta to tb is obtained (See examples below). -- selects select tb.b_id, ( select count( distinct tc.ta1 ) from tc where tc.tb1 = tb.b_id and tc.ta1 is not null ) as aa, ( select count( distinct tc.ta2 ) from tc where tc.tb2 = tb.b_id and tc.ta2 is not null and tc.ta2 not in ( select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null ) ) as bb, ( select count( distinct tc.ta3 ) from tc where tc.tb3 = tb.b_id and tc.ta3 is not null and tc.ta3 not in ( select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null ) and tc.ta3 not in ( select distinct tc.ta2 from tc where tc.tb2 = tb.b_id and tc.ta2 is not null ) ) as cc from tb ; -- results from selects =# select * from tc order by c_id; c_id | ta1 | ta2 | ta3 | tb1 | tb2 | tb3 ------+-----+-----+-----+-----+-----+----- 1 | 1 | 1 | 1 | 1 | 3 | 2 2 | 3 | 3 | 3 | 2 | 1 | 3 3 | 4 | 4 | 4 | 2 | 1 | 3 4 | 5 | 5 | 5 | 1 | 2 | 3 5 | 5 | 5 | 5 | 1 | 2 | 3 6 | 1 | 1 | 1 | 1 | 2 | 3 7 | 4 | 4 | 4 | 2 | 1 | 3 8 | 3 | 3 | 3 | 2 | 1 | 3 (8 rows) =# select tb.b_id, ( select count( distinct tc.ta1 ) from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) as aa, ( select count( distinct tc.ta2 ) from tc where tc.tb2 = tb.b_id and tc.ta2 is not null and tc.ta2 not in ( select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null)) as bb, ( select count( distinct tc.ta3 ) from tc where tc.tb3 = tb.b_id and tc.ta3 is not null and tc.ta3 not in ( select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) and tc.ta3 not in ( select distinct tc.ta2 from tc where tc.tb2 = tb.b_id and tc.ta2 is not null)) as cc from tb where tb.b_id < 4; b_id | aa | bb | cc ------+----+----+---- 1 | 2 | 2 | 0 2 | 2 | 0 | 0 3 | 0 | 0 | 0 (3 rows) Obviously not the correct result for rows 2 and 3. =# select tb.b_id, ( select count( distinct tc.ta1 ) from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) as aa, ( select count( distinct tc.ta2 ) from tc where tc.tb2 = tb.b_id and tc.ta2 is not null and tc.ta2 not in ( select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null)) as bb, ( select count( distinct tc.ta3 ) from tc where tc.tb3 = tb.b_id and tc.ta3 is not null and tc.ta3 not in ( select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) and tc.ta3 not in ( select distinct tc.ta2 from tc where tc.tb2 = tb.b_id and tc.ta2 is not null)) as cc from tb where tb.b_id = 1; b_id | aa | bb | cc ------+----+----+---- 1 | 2 | 2 | 0 (1 row) =# select tb.b_id, ( select count( distinct tc.ta1 ) from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) as aa, ( select count( distinct tc.ta2 ) from tc where tc.tb2 = tb.b_id and tc.ta2 is not null and tc.ta2 not in ( select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null)) as bb, ( select count( distinct tc.ta3 ) from tc where tc.tb3 = tb.b_id and tc.ta3 is not null and tc.ta3 not in ( select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) and tc.ta3 not in ( select distinct tc.ta2 from tc where tc.tb2 = tb.b_id and tc.ta2 is not null)) as cc from tb where tb.b_id = 2; b_id | aa | bb | cc ------+----+----+---- 2 | 2 | 2 | 0 (1 row) This is what we should've gotten in the first query for the second row. =# select tb.b_id, ( select count( distinct tc.ta1 ) from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) as aa, ( select count( distinct tc.ta2 ) from tc where tc.tb2 = tb.b_id and tc.ta2 is not null and tc.ta2 not in ( select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null)) as bb, ( select count( distinct tc.ta3 ) from tc where tc.tb3 = tb.b_id and tc.ta3 is not null and tc.ta3 not in ( select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) and tc.ta3 not in ( select distinct tc.ta2 from tc where tc.tb2 = tb.b_id and tc.ta2 is not null)) as cc from tb where tb.b_id = 3; b_id | aa | bb | cc ------+----+----+---- 3 | 0 | 1 | 3 (1 row) This is what we should've gotten in the first query for the third row. Further strangeness: I would think that the following two select statements should yeild the same results. =# select tb.b_id, ( select count( distinct tc.ta1 ) from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) as aa, ( select count( distinct tc.ta2 ) from tc where tc.tb2 = tb.b_id and tc.ta2 is not null and tc.ta2 not in ( select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null)) as bb, ( select count( distinct tc.ta3 ) from tc where tc.tb3 = tb.b_id and tc.ta3 is not null and tc.ta3 not in ( select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) and tc.ta3 not in ( select distinct tc.ta2 from tc where tc.tb2 = tb.b_id and tc.ta2 is not null)) as cc from tb where tb.b_id = 1 or tb.b_id = 2 or tb.b_id = 3; b_id | aa | bb | cc ------+----+----+---- 1 | 2 | 2 | 0 2 | 2 | 0 | 0 3 | 0 | 0 | 0 (3 rows) Incorrect output. Same as the very first query. =# select tb.b_id, ( select count( distinct tc.ta1 ) from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) as aa, ( select count( distinct tc.ta2 ) from tc where tc.tb2 = tb.b_id and tc.ta2 is not null and tc.ta2 not in ( select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null)) as bb, ( select count( distinct tc.ta3 ) from tc where tc.tb3 = tb.b_id and tc.ta3 is not null and tc.ta3 not in ( select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) and tc.ta3 not in ( select distinct tc.ta2 from tc where tc.tb2 = tb.b_id and tc.ta2 is not null)) as cc from tb where tb.b_id = 3 or tb.b_id = 2 or tb.b_id = 1; b_id | aa | bb | cc ------+----+----+---- 3 | 0 | 1 | 3 2 | 2 | 2 | 0 1 | 2 | 2 | 0 (3 rows) Strangely, this is the result we were expecting from our original query! Is it possible that the sub-selects are somehow affecting the result sets? Seen on: % postmaster --version postmaster (PostgreSQL) 7.1.3 and % postmaster --version postmaster (PostgreSQL) 7.1.2 -- drops drop sequence tc_seq; drop sequence tb_seq; drop sequence ta_seq; drop table tc; drop table tb; drop table ta; I'd appreciate any comments you may have regarding this post :-) sidster -- They who would sacrifice freedom for security will have neither. -Ben Franklin ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org