RE: [PHP-DB] Archiving A Database

2002-11-07 Thread Josh Johnson
Thanks Peter!

It turns out the people I work for don't care at all about this data,
they were only using it to push clients to buy bigger packages and for
bragging rights, so grand totals/totals per client/per period are all
they need, the extra data I was logging *just in case* is totally
useless to them, so they pushed for the quickest solution. 

I ended up implementing a scheme to just sort of condense the data by
a common factor (the client's username), and record totals (see my post:
[PHP-DB] Query Help for more info). I'm using cron jobs to run SQL on
the database periodically to move the data around, and I used table
locking in an attempt to keep the data sound. I'm still weary of the
huge number (ok, its only like 5, but still) of queries and loops I'm
using to gather the data for display, but it seems to be working ok (any
suggestions would be greatly appreciated). 

I'll definitely keep your suggestions in mind for more serious archiving
of data in the future. 

Thanks again!

-- Josh

-Original Message-
From: Peter Beckman [mailto:beckman;purplecow.com] 
Sent: Thursday, November 07, 2002 12:05 AM
To: Josh Johnson
Cc: [EMAIL PROTECTED]
Subject: Re: [PHP-DB] Archiving A Database

Do you need to archive it forever?  Do you need live access to it?

If you just need to archive it, read the man page for mysqldump.  You
can
dump specific rows to a file, rename that file dump.2002.05.24, compress
it
with gzip, and voila, you have all of your data readily available,
easily
reinserted into your existing live DB.  After the dump is complete, just
delete those rows out of the DB.

Now you have to be sure that the file contains 100% of the data and is
accurate.  You'll have to figure out how to do that yourself!  One way I
did it with WWW log files was to make a tmp copy, run a program on it,
see
if the output had the same number of lines in it, and if it did, delete
the
tmp copy and replace the original copy with the new copy (IP-DNS
translation).

In this case you could write a script that parses the newly created dump
file in the standard format and, heck, using perl or PHP parse the thing
for ),( and see how many you get.  If the mysql select gets X rows, and
the
),( or ); occurs X-1 times, then you know you got all the rows.
(OK,
it will be more complex than this, but you get the idea.)

One record per file could get ugly real quick, both on the filesystem
side
and the maintenance side.  Doing it by day will allow you to repopulate
an
entire day back into the DB as needed for billing questions or what not.
MySQL will keep the unique ID (assuming it is a number auto_incremented)
unique and only use ever higher IDs, not use old ones that no longer
exist
in the DB at the moment.  Once you are done with that Day (or days) of
data, you just delete it from the DB, knowing you have a copy of that
day
on file.

I would also advise you to have 2-4 copies of all your backups on both
physically different computers as well as physically different
locations.
I kept a copy of my live DB, backed up every 6 hours, at the data center
on
other computer, on my computer file server at home, and another one 7
states over.  Sure, it's a bit of a pain in the ass, but if the data
center
goes up in smoke and my file server is hacked on the same day, I still
have
a copy 7 states over.  I just used scp (secure-shell copy) and/or rsync
+
ssh to keep copies current and up to date on remote machines.

Peter

On Tue, 29 Oct 2002, Josh Johnson wrote:

 Does anyone know of speedy methods of periodically archiving and
 retrieving a database table? I've got a table I'm using to log
 statistical data, and it's getting 10k+ entries per day. Entries are
 only logged in the database if the user is authenticated for access,
and
 only once per access session (the user is authenticated, a cookie is
 set, and the entry is logged. From that point the user is
authenticated
 by verifying the cookie, which expires when the user's browser closes)

 The data is critical (used for billing and such), but the log viewing
 script I've written runs slower and slower with each passing day, and
at
 one point the entire table was corrupted due to neglect on our system
 admin's part (I warned them about the default setting of
 mysql_max_connections being too low for a site that relied on the DB
so
 heavily and had so much traffic.). The script is a simple front end to
 several MySQL queries, that retrieve totals for the past hour, day,
 week, month and grand totals for all clients or a given client. The
rest
 of the site has been unaffected by the log issues.

 My current approach is to log the data as XML, one record per file,
and
 create a log file that is an index of refrences to each record. I then
 empty previous data n the database. This works, but is very hard/slow
to
 retrieve specific groups of information, and dumping the database
takes
 a long time as well, which adds the difficulty of ensuring no records
 are lost. I also haven't yet

Re: [PHP-DB] Archiving A Database

2002-11-06 Thread Peter Beckman
Do you need to archive it forever?  Do you need live access to it?

If you just need to archive it, read the man page for mysqldump.  You can
dump specific rows to a file, rename that file dump.2002.05.24, compress it
with gzip, and voila, you have all of your data readily available, easily
reinserted into your existing live DB.  After the dump is complete, just
delete those rows out of the DB.

Now you have to be sure that the file contains 100% of the data and is
accurate.  You'll have to figure out how to do that yourself!  One way I
did it with WWW log files was to make a tmp copy, run a program on it, see
if the output had the same number of lines in it, and if it did, delete the
tmp copy and replace the original copy with the new copy (IP-DNS
translation).

In this case you could write a script that parses the newly created dump
file in the standard format and, heck, using perl or PHP parse the thing
for ),( and see how many you get.  If the mysql select gets X rows, and the
),( or ); occurs X-1 times, then you know you got all the rows.  (OK,
it will be more complex than this, but you get the idea.)

One record per file could get ugly real quick, both on the filesystem side
and the maintenance side.  Doing it by day will allow you to repopulate an
entire day back into the DB as needed for billing questions or what not.
MySQL will keep the unique ID (assuming it is a number auto_incremented)
unique and only use ever higher IDs, not use old ones that no longer exist
in the DB at the moment.  Once you are done with that Day (or days) of
data, you just delete it from the DB, knowing you have a copy of that day
on file.

I would also advise you to have 2-4 copies of all your backups on both
physically different computers as well as physically different locations.
I kept a copy of my live DB, backed up every 6 hours, at the data center on
other computer, on my computer file server at home, and another one 7
states over.  Sure, it's a bit of a pain in the ass, but if the data center
goes up in smoke and my file server is hacked on the same day, I still have
a copy 7 states over.  I just used scp (secure-shell copy) and/or rsync +
ssh to keep copies current and up to date on remote machines.

Peter

On Tue, 29 Oct 2002, Josh Johnson wrote:

 Does anyone know of speedy methods of periodically archiving and
 retrieving a database table? I've got a table I'm using to log
 statistical data, and it's getting 10k+ entries per day. Entries are
 only logged in the database if the user is authenticated for access, and
 only once per access session (the user is authenticated, a cookie is
 set, and the entry is logged. From that point the user is authenticated
 by verifying the cookie, which expires when the user's browser closes)

 The data is critical (used for billing and such), but the log viewing
 script I've written runs slower and slower with each passing day, and at
 one point the entire table was corrupted due to neglect on our system
 admin's part (I warned them about the default setting of
 mysql_max_connections being too low for a site that relied on the DB so
 heavily and had so much traffic.). The script is a simple front end to
 several MySQL queries, that retrieve totals for the past hour, day,
 week, month and grand totals for all clients or a given client. The rest
 of the site has been unaffected by the log issues.

 My current approach is to log the data as XML, one record per file, and
 create a log file that is an index of refrences to each record. I then
 empty previous data n the database. This works, but is very hard/slow to
 retrieve specific groups of information, and dumping the database takes
 a long time as well, which adds the difficulty of ensuring no records
 are lost. I also haven't yet tackled the issue of combining data from
 the XML seamlessly with live data from the database.

 I'm opening the floor to other potential solutions, or war stories from
 anyone else who's been in this situation before. If there's another
 approach aside from writing individual flat files, please let me know. I
 haven't tested it yet, but I think maybe logging all entries/data for a
 given log period into one file would speed things up, but I worry about
 the 40 megabyte files that could be potentially created. I'm also open
 to complete alternatives, since the main use for this data is to
 generate total accesses per a given factor (client, ip, referrer, etc).

 Any help is greatly appreciated. I can give any information needed, just
 ask!

 Thanks,
 -- Josh

 P.S. I did suggest that this issue be tackled when I first installed the
 system that creates the log, but management didn't think it was worth
 bothering with at the time.

 P.P.S. I did look through the archives before I posted, and didn't find
 anything. Please flame in private if I missed something!



 --
 PHP Database Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DB] Archiving A Database

2002-10-29 Thread Josh Johnson
Does anyone know of speedy methods of periodically archiving and
retrieving a database table? I've got a table I'm using to log
statistical data, and it's getting 10k+ entries per day. Entries are
only logged in the database if the user is authenticated for access, and
only once per access session (the user is authenticated, a cookie is
set, and the entry is logged. From that point the user is authenticated
by verifying the cookie, which expires when the user's browser closes)

The data is critical (used for billing and such), but the log viewing
script I've written runs slower and slower with each passing day, and at
one point the entire table was corrupted due to neglect on our system
admin's part (I warned them about the default setting of
mysql_max_connections being too low for a site that relied on the DB so
heavily and had so much traffic.). The script is a simple front end to
several MySQL queries, that retrieve totals for the past hour, day,
week, month and grand totals for all clients or a given client. The rest
of the site has been unaffected by the log issues.

My current approach is to log the data as XML, one record per file, and
create a log file that is an index of refrences to each record. I then
empty previous data n the database. This works, but is very hard/slow to
retrieve specific groups of information, and dumping the database takes
a long time as well, which adds the difficulty of ensuring no records
are lost. I also haven't yet tackled the issue of combining data from
the XML seamlessly with live data from the database.

I'm opening the floor to other potential solutions, or war stories from
anyone else who's been in this situation before. If there's another
approach aside from writing individual flat files, please let me know. I
haven't tested it yet, but I think maybe logging all entries/data for a
given log period into one file would speed things up, but I worry about
the 40 megabyte files that could be potentially created. I'm also open
to complete alternatives, since the main use for this data is to
generate total accesses per a given factor (client, ip, referrer, etc). 

Any help is greatly appreciated. I can give any information needed, just
ask!

Thanks, 
-- Josh

P.S. I did suggest that this issue be tackled when I first installed the
system that creates the log, but management didn't think it was worth
bothering with at the time. 

P.P.S. I did look through the archives before I posted, and didn't find
anything. Please flame in private if I missed something!



-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php