On Apr 15, 2009, at 1:15 PM, Steve Midgley wrote:
Hi,
I'm trying to figure out how to do something which I'd guess is easy
for a sql whiz but has me stumped. I would greatly appreciate any
help on this - it's a form of SQL query that I've never figured out,
but have wanted to use many times over the years..
I want to generate an analysis report that counts the values in two
separate tables. I've been able to accomplish what I want with two
separate queries that I then merge together in Excel. Essentially
what I need is a "horizontal UNION" statement (or something like
that).
I've included some DDL and sample SQL queries that explain what I
want better than I can in English, but the general idea is:
get a FK id and count of a certain column in one table, based on
some criteria
-> for each FK id, get the count of a different column in a
different table
Display the counts from both queries side-by-side along with the FK
id's in a single result set
Thanks for any assistance on this!
Steve
/*SQL STARTS*/
drop table if exists contact_log;
drop table if exists contact_property;
create table contact_log(id serial NOT null, src_contact_id integer,
log_type character varying(63), CONSTRAINT contact_log_pkey PRIMARY
KEY (id));
create table contact_property(id serial NOT null, contact_id
integer, property_id integer,
CONSTRAINT contact_property_pkey PRIMARY KEY (id), CONSTRAINT
contact_property_cid_pid UNIQUE (contact_id, property_id));
insert into contact_log (src_contact_id, log_type) values(1, 'web');
insert into contact_log (src_contact_id, log_type) values(1, 'web');
insert into contact_log (src_contact_id, log_type) values(1, 'web');
insert into contact_log (src_contact_id, log_type) values(1,
'detail');
insert into contact_log (src_contact_id, log_type) values(1,
'detail');
insert into contact_log (src_contact_id, log_type) values(2,
'detail');
insert into contact_log (src_contact_id, log_type) values(2,
'detail');
insert into contact_log (src_contact_id, log_type) values(2, 'web');
insert into contact_log (src_contact_id, log_type) values(2,
'foobar');
insert into contact_log (src_contact_id, log_type) values(3,
'foobar');
insert into contact_log (src_contact_id, log_type) values(4, 'web');
insert into contact_property (contact_id, property_id) values(1, 20);
insert into contact_property (contact_id, property_id) values(1, 21);
insert into contact_property (contact_id, property_id) values(1, 22);
insert into contact_property (contact_id, property_id) values(2, 23);
insert into contact_property (contact_id, property_id) values(2, 24);
insert into contact_property (contact_id, property_id) values(1, 50);
insert into contact_property (contact_id, property_id) values(3, 51);
insert into contact_property (contact_id, property_id) values(5, 52);
-- This gets what I want from contact_log
select src_contact_id, count(log_type)
from contact_log
where
contact_log.src_contact_id in (select contact_id from
contact_property)
and log_type in ('web', 'detail')
and src_contact_id in (select contact_id from contact_property)
group by src_contact_id
order by src_contact_id;
-- correct output is : 1|5, 2|3
-- This gets what I want from contact_property
select contact_id, count(property_id)
from contact_property
where
contact_id in (select src_contact_id from contact_log where log_type
in ('web', 'detail'))
group by contact_id
order by contact_id;
-- correct output is: 1|4, 2|2
-- THIS DOESN'T WORK (of course - but what would?)
select src_contact_id, count(log_type), count(property_id)
from contact_log
join contact_property cp on cp.contact_id = contact_log.src_contact_id
where
contact_log.src_contact_id in (select contact_id from
contact_property)
and log_type in ('web', 'detail')
group by src_contact_id
order by src_contact_id
-- correct output *should be* : 1|5|4, 2|3|2
/*SQL ENDS*/
First, in that last query, working or not, you don't need the
"contact_log.src_contact_id in (select contact_id from
contact_property)" clause as you've already covered that with the
join condtion "cp.contact_id = contact_log.src_contact_id".
Anyways, on to your actual question, you can't do that in one level
from what I can see as the query first does the join and the executes
the aggregates on the results of the join. Let's check out the
results of that join without the aggregates (I'm ignoring the id
values here since they don't come into play and it will help
demonstrate what's happening later):
select cl.src_contact_id, cl.log_type, cp.contact_id, cp.property_id
from contact_log cl, contact_property cp
where cl.src_contact_id = cp.contact_id
and cl.log_type in ('web', 'detail');
src_contact_id | log_type | contact_id | property_id
----------------+----------+------------+-------------
1 | detail | 1 | 20
1 | detail | 1 | 20
1 | web | 1 | 20
1 | web | 1 | 20
1 | web | 1 | 20
1 | detail | 1 | 21
1 | detail | 1 | 21
1 | web | 1 | 21
1 | web | 1 | 21
1 | web | 1 | 21
1 | detail | 1 | 22
1 | detail | 1 | 22
1 | web | 1 | 22
1 | web | 1 | 22
1 | web | 1 | 22
2 | web | 2 | 23
2 | detail | 2 | 23
2 | detail | 2 | 23
2 | web | 2 | 24
2 | detail | 2 | 24
2 | detail | 2 | 24
1 | detail | 1 | 50
1 | detail | 1 | 50
1 | web | 1 | 50
1 | web | 1 | 50
1 | web | 1 | 50
That is exactly what gets processed by the aggregates. Aggregates
process *a* relation, here the relation produced by the join. Now,
that query with the counts makes more sense:
select cl.src_contact_id, count(log_type), count(property_id)
from contact_log cl, contact_property cp
where cl.src_contact_id = cp.contact_id
and cl.log_type in ('web', 'detail')
group by cl.src_contact_id
src_contact_id | count | count
----------------+-------+-------
1 | 20 | 20
2 | 6 | 6
Joining against a subquery for the second count does the trick:
select src_contact_id, count(log_type), cp.count
from contact_log ,
(select contact_id, count(property_id)
from contact_property
group by contact_id) as cp
where src_contact_id = cp.contact_id
and log_type in ('web', 'detail')
group by src_contact_id, cp.count
order by src_contact_id
src_contact_id | count | count
----------------+-------+-------
1 | 5 | 4
2 | 3 | 2
Note that you have to add the count pulled from subquery to the group
by since at that point it's a constant and not an aggregate function
anymore, else you get an SQL error in the outer group by clause.
Here's the query without the aggregate in the outer query to help make
that clear:
select src_contact_id, log_type, cp.count
from contact_log ,
(select contact_id, count(property_id)
from contact_property
group by contact_id) as cp
where src_contact_id = cp.contact_id
and log_type in ('web', 'detail')
order by src_contact_id;
src_contact_id | log_type | count
----------------+----------+-------
1 | web | 4
1 | detail | 4
1 | web | 4
1 | web | 4
1 | detail | 4
2 | detail | 2
2 | detail | 2
2 | web | 2
That says, "Give me each src_contact_id and log_type pair from
contact_log along with the count from contact_property where
contact_id = src_contact_id".
Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql