I've implemented the scheme I briefly outlined in my last post (locking
for data integrity), and now I'm writing queries to get totals from my
logs, and I think I might be over-complicating things, and I'd like some
alternative views on the subject.

My "grandiose scheme" works like this:

Problem: logging unique users that are authenticated for access to my
companies website into a MySQL database table, only valid data to report
is total number of unique authentications. 

Solution: because of the high volume of unique authentications being
logged (over 15k per day), I periodically "condense" the data into 3
tables beyond the main table that logs users as they come in. 

My tables look like this:

| access_log                |
| Field     | Type          | Null | Key | Default | Extra |
| timestamp | timestamp(14) | YES  | MUL | NULL    |       |
| code      | text          |      |     | NULL    |       |
| refer     | text          |      |     | NULL    |       |
| ip        | varchar(64)   |      |     |         |       |
| access    | varchar(128)  |      | MUL | 0       |       |

NOTE: The access field in the access_log table correlates to the client
field in the table below. Logging was an afterthought initially, but
then it became very important, so I cannot change the field now without
a lot of debugging. Doing this database cleanup wasn't a priority either
until a couple of months later when the size of the access_log table
grew out of control (over 1GB on disk, 800k records).

| access_log_daily/weekly/monthly     |
| Field  | Type          | Null | Key | Default | Extra |
| client | varchar(32)   |      |     |         |       |
| total  | bigint(20)    |      |     | 0       |       |
| stamp  | timestamp(14) | YES  |     | NULL    |       |

I've set up a cron jobs that run these queries periodically:
LOCK TABLES access_log WRITE, access_log_daily WRITE;
INSERT INTO access_log_daily (client, total) 
       SELECT access, COUNT(timestamp) AS total 
       FROM access_log GROUP BY access;
DELETE FROM access_log;

LOCK TABLES access_log_daily WRITE, access_log_weekly WRITE;
INSERT INTO access_log_weekly (client, total) 
       SELECT client, SUM(total) as total 
       FROM access_log_daily GROUP BY client;
DELETE FROM access_log_daily;

LOCK TABLES access_log_weekly WRITE, access_log_monthly WRITE;
INSERT INTO access_log_monthly (client, total) 
       SELECT client, SUM(total) as total 
       FROM access_log_weekly GROUP BY client;
DELETE FROM access_log_weekly;

So this way, the access_log table is never bigger than the total number
of users for a 24 hour period, but we can still look into it if a
problem arises where we need someone's ip address or the auth code they
were given, and the other tables can only be as big as the interval of
"condensation", times the total number of clients, and store totals, so
it *should* be easy to gather information from them.

In my reporting scripts, I need to be able to find the grand totals for
the past hour, day, week, month, and year. My first idea was to use
multiple queries and just add all of the results together, but I'm not
sure if this is the best way, or if it will reflect the most accurate
per-period results. I appreciate any input anyone has. 

-- Josh

PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to