Peter Jin jinyup...@gmail.com writes:
We have an audit system which uses separate tables with similar structure to
store different kinds of activities for a user. We want to report the number
of activities per activity type per user.
Still sounds pretty standard in terms of database schema. Can I
assume that these separate activity tables have some field or unique
key that correlates back to a user table or table containing per-user
information? If so, then unless the tables are stored in separate
database servers, it sounds like you just want to join those tables as
part of the primary query without much hassle.
Dumb example - you have a set of users, who can get rows added to one of
three activity tables, related to three activities that have different
metrics. Each row in an activity table is a single instance of that
user performing that activity:
users
id integer primary key
name text
activity1
user_id integer -- foreign key references users(id)
distance integer
activity2
user_id integer -- foreign key references users(id)
hits integer
activity3
user_id integer -- foreign key references users(id)
weight integer
(The join will work without the database having explicit foreign key
references but they should be there for referential integrity if the
activity tables do have entries for users. For performance you should
also ensure that users.id is indexed at a minimum).
Then, a query like the following:
SELECTid, name,
count(a1.distance) as a1_cnt, sum(a1.distance) as a1_total,
count(a2.hits) as a2_cnt, sum(a2.hits) as a2_total,
count(a3.weight) as a3_cnt, sum(a3.weight) as a3_total
FROM users
LEFT JOIN activity1 a1 on users.id = a1.user_id
LEFT JOIN activity2 a2 on users.id = a2.user_id
LEFT JOIN activity3 a3 on users.id = a3.user_id
GROUP BY id, name;
would produce a result set looking like:
id name a1_cnt a1_total a2_cnt a2_total a3_cnt a3_total
1User 1 ## ## ##
2User 2 ## ## ##
...
Feed that into a JasperReports report designed for grouping by id/user
and then you have access to all of the total information in a single
detail line for that user, to be presented in the per-user section of
that report.
Now if you're looking for individual detail lines for each activity
for each user, then I think your original thought of a UNION query was
on the right track, although you're still joining within the
individual components of the UNION to link the user to the activity.
For example:
SELECT * from (
SELECTid, name, 'activity1' as activity,
count(a.distance) as count, sum(a.distance) as total
FROM users LEFT JOIN a1 a on users.id = a.user_id
GROUP BY id, name, activity
UNION
SELECTid, name, 'activity2' as activity,
count(a.hits) as count, sum(a.hits) as total
FROM users LEFT JOIN a2 a on users.id = a.user_id
GROUP BY id, name, activity
UNION
SELECTid, name, 'activity3' as activity,
count(a.weight) as count, sum(a.weight) as total
FROM users LEFT JOIN a3 a on users.id = a.user_id
GROUP BY id, name, activity
) as data
ORDER by id, name, activity;
which would result in a result set like:
id nameactivitycounttotal
1 User 1 activity1## ###
1 User 1 activity2## ###
1 User 1 activity3## ###
2 User 2 activity1## ###
2 User 2 activity2## ###
2 User 2 activity3## ###
which I think would work fine for Jasper to perform nested groupings
on it, first by id/name and then by activity.
I know you mentioned HQL not supporting a UNION, but JasperReport can
make a direct SQL query to the underlying data, so I'm not sure that HQL
needs to be involved. Also, you mentioned being concerned with the size
of the data set, but as you can see here, you'll only get one summary
row per user, per activity, so I'm not sure how the data set can be any
smaller (even via a mechanism other than UNION) and still provide you
with per-activity row data for display in the report.
Note that if the activity tables have columns for a name for the activity
you could select that in lieu of the static string, though if there's any
chance for overlap you'll then want a UNION ALL instead.
Nesting the query and applying an overall ordering helps ensure that
you get the sequence in an appropriate order for reporting, since
otherwise an engine might produce an arbitrary ordering.
Or, if the per-user section of the report is just supposed to show
a row per activity with total values (e.g., just what this query gives),
then there's no need to have a