Re: [analog-help] Stupidest Ques. Award Goes To...

2000-04-05 Thread Stephen Turner

On Tue, 4 Apr 2000, Kevin Hemenway wrote:
 
 Our current analyzer stores everything in a tab delimited flatfile - it
 doesn't remember referrer's, it doesn't remember much of anything besides
 hits and total bytes for hits. And that's just fine.

This is pretty much what analog's cache file does. It stores the total
number of hits for each item, but doesn't cross-reference (say) files and
referrers.

 Ok. So, how would I go about this? One statement worries me: "A couple of
 other minor points: the pattern of failed requests and redirected requests
 over time is not recorded in the cache file. So although the total number
 will still be correct, the number in the last 7 days can be under-reported
 subsequently. And times are only recorded to five-minute resolution."
 
 One of my methods of madness is to take a look at the weekly logfile, find
 out how many accesses happened that week and add that to a general total on
 the home page of my site (Disobey.com). This statement worries me in that
 I'm very vain when it comes to total hits (over 7 million now).
 
 Would this ruin that vanity?
 
 [...] Uh. Why would Analog look at the historic
 cache files to determine the hits for the brand-new last seven days report?
 

The only issue is when the historical cache file overlaps the last seven
days. Does that answer your question?

 Ignoring that question, where are the cache files created?

Anywhere you want.

 directory called /usage - have analog send its html reports there, and
 keep the cache reports in /usage/cache? And then each week, Analog would
 read from /usage/cache plus the new weekly log file to generate a new
 report under /usage? 
 
 Would this new report under usage, because of the cache have total hits and
 monthly stuff based on how back the cache goes?
 

I guess you've got two main choices. The one I recommend in the docs is to
create a cache file from each logfile: then when you want a report, analyse
all the cache files. The other would be to create a cumulative cache file
each week based on the last cache file and the new week's logfile.
Personally I think this second procedure is much more likely to get
confused, and when it does, all the data is corrupted together. :)

 Again, thanks for explaining all this to an Analog newbie. I hope I'm not
 embarrassing the mailing list all that much ;)
 

Not at all. I wish all the questions here were this intelligent. :)

-- 
Stephen Turner   http://www.statslab.cam.ac.uk/~sret1/
  Statistical Laboratory, 16 Mill Lane, Cambridge CB2 1SB, England
"8th March 2000. National No Smoking Day. Ash Wednesday." (On a calendar)


This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] Stupidest Ques. Award Goes To...

2000-04-05 Thread Stephen Turner

On Wed, 5 Apr 2000, Kevin Hemenway wrote:
 
 Ok. So, this should never come into effect if I just set up a weekly log
 report that adds a new week to all the old cache files and reports? I have
 no plans on changing the reporting frequency.
 
 How would this come into effect though? When would the historical cache file
 (HCF) suck into that seven days? I can see that happening if the machine's
 date is set wrong, and the log file entries are subsequently dated wrong. I
 can see that happening if someone wanted to do the last ten days (from/to),
 but that's explained in the docs.
 
 Is there any time where an innocent nonesuch can cause the HCF to overlap?
 I'm just overly paranoid because the statement seemed so strong in the docs.
 

I really don't think it's a big issue. It's only the number of failed and
redirected requests in the last 7 days which goes wrong, not the successful
requests.

 Definitely makes sense. Is there any significant speed decrease when opening
 up VIRTDOMAINS x WEEKS x YEARS cache files per week as opposed to
 VIRTDOMAINS x YEARS or VIRTDOMAINS?

Probably not much.

 I can see Analog doing 1994 - 2000 on
 your machine, which is nice - how long does that take?

18 minutes on a 266 chip, but I don't use cache files. It would be MUCH
quicker if I did.

 Do you have any
 system load readouts?

At a guess, something like this:


   1 |  ##
 |  ##
 |  ##
 |  ##
 |  ##
   0 ###
  0:02am   0:20am

:)

-- 
Stephen Turner   http://www.statslab.cam.ac.uk/~sret1/
  Statistical Laboratory, 16 Mill Lane, Cambridge CB2 1SB, England
"8th March 2000. National No Smoking Day. Ash Wednesday." (On a calendar)


This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] Stupidest Ques. Award Goes To...

2000-04-05 Thread Stephen Turner

On Tue, 4 Apr 2000, Jeremy Wadsack wrote:
 
  c) adding a new subdirectory to specifically watch.
 
 Again, does not affect the cache files.
 

As long as it wasn't done with a FILEINCLUDE/FILEEXCLUDE when creating the
cache files.

-- 
Stephen Turner   http://www.statslab.cam.ac.uk/~sret1/
  Statistical Laboratory, 16 Mill Lane, Cambridge CB2 1SB, England
"8th March 2000. National No Smoking Day. Ash Wednesday." (On a calendar)


This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] Stupidest Ques. Award Goes To...

2000-04-05 Thread Kevin Hemenway

  [...] Uh. Why would Analog look at the historic
  cache files to determine the hits for the brand-new last seven days
report?
 

 The only issue is when the historical cache file overlaps the last seven
 days. Does that answer your question?

Ok. So, this should never come into effect if I just set up a weekly log
report that adds a new week to all the old cache files and reports? I have
no plans on changing the reporting frequency.

How would this come into effect though? When would the historical cache file
(HCF) suck into that seven days? I can see that happening if the machine's
date is set wrong, and the log file entries are subsequently dated wrong. I
can see that happening if someone wanted to do the last ten days (from/to),
but that's explained in the docs.

Is there any time where an innocent nonesuch can cause the HCF to overlap?
I'm just overly paranoid because the statement seemed so strong in the docs.

 I guess you've got two main choices. The one I recommend in the docs is to
 create a cache file from each logfile: then when you want a report,
analyse
 all the cache files. The other would be to create a cumulative cache file
 each week based on the last cache file and the new week's logfile.
 Personally I think this second procedure is much more likely to get
 confused, and when it does, all the data is corrupted together. :)

Definitely makes sense. Is there any significant speed decrease when opening
up VIRTDOMAINS x WEEKS x YEARS cache files per week as opposed to
VIRTDOMAINS x YEARS or VIRTDOMAINS? I can see Analog doing 1994 - 2000 on
your machine, which is nice - how long does that take? Do you have any
system load readouts? (Perhaps a MRTG chart showing load and duration?).

I'm being especially paranoid, as you can see. The primary, and quickly
explained reason is:

a) old, free log program had y2k issue,
b) personally i like analog's report/cust. better,
c) boss is annoyed at y2k issue with old, free log program
d) boss doesn't want to pay for new, old free log program g...
e) now is perfect time to strike with analog...
f) ... but everything has to be perfect ;)


  Again, thanks for explaining all this to an Analog newbie. I hope I'm
not
  embarrassing the mailing list all that much ;)
 

 Not at all. I wish all the questions here were this intelligent. :)

Whoo hoo! ;)

Kevin Hemenway
-- -
Total Net NH, LLC  EMAIL: [EMAIL PROTECTED]
15 Pleasant St., Suite 11  WEBSITE: http://www.totalnetnh.net/
Concord, NH 03301  PHONE: (603) 225-8422





This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] Stupidest Ques. Award Goes To...

2000-04-04 Thread Kevin Hemenway

 I don't see why. Log files compress *very* nicely (typically 95%-98%
 compression ratios at maximum compression), and Analog has no problem with
 compressed log files. Disk space is cheap nowadays, so why would keeping
 six years of log files be crazy? I have Analog running on three years'
 worth, and I'm sure there's people on this list who've got that beat.

Quite true, quite true, they do, but that's simply don't an option I'm
looking at in this case. Yes, we have more than enough space in there to
handle that. What I don't want to do is have 120 virtual domains, with 3
years of log files each, being reanalyzed every week.

I don't care how good Analog is - that's just not a server load, or a
process I would like to see happening. Nothing against Analog, of course.

So, scratch the idea of keeping the log files. Do I have any other option?

Kevin Hemenway
-- -
Total Net NH, LLC  EMAIL: [EMAIL PROTECTED]
15 Pleasant St., Suite 11  WEBSITE: http://www.totalnetnh.net/
Concord, NH 03301  PHONE: (603) 225-8422




This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] Stupidest Ques. Award Goes To...

2000-04-04 Thread Kevin Hemenway

 Kevin Hemenway wrote:

  So, scratch the idea of keeping the log files. Do I have any other
option?

 Cache files. See http://www.analog.cx/docs/cache.html. As long as you
won't
 later be changing the data you want to report on from the past, this will
work
 wonderfully. They reduce the amount of storage and memory usage needed by
Analog
 and can be compressed in disk as well.

And if I did change the format? The data itself wouldn't change - it'd be
straight log files from Apache 1.3.9. However, stuff that may change in the
future:

a) addition of new logs (ie, a referrer log, or error log report).
b) adding or removing a report from view
c) adding a new subdirectory to specifically watch.

I have no intention of messing with inclusions or exclusions. What would
happen if I changed something that was different with the cache files, and
Analog still ran? Corrupted data? Ignored cache files? New data only for the
new cache files, and old data still displayed from the old cache files?
Nothing?

Kevin Hemenway
-- -
Total Net NH, LLC  EMAIL: [EMAIL PROTECTED]
15 Pleasant St., Suite 11  WEBSITE: http://www.totalnetnh.net/
Concord, NH 03301  PHONE: (603) 225-8422




This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] Stupidest Ques. Award Goes To...

2000-04-04 Thread Jeremy Wadsack



Kevin Hemenway wrote:

 So, scratch the idea of keeping the log files. Do I have any other option?

Cache files. See http://www.analog.cx/docs/cache.html. As long as you won't
later be changing the data you want to report on from the past, this will work
wonderfully. They reduce the amount of storage and memory usage needed by Analog
and can be compressed in disk as well.

HTH,

Jeremy Wadsack
Wadsack-Allen Digital Group



This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] Stupidest Ques. Award Goes To...

2000-04-04 Thread Jeremy Wadsack



Kevin Hemenway wrote:

  Kevin Hemenway wrote:
 
   So, scratch the idea of keeping the log files. Do I have any other
 option?
 
  Cache files. See http://www.analog.cx/docs/cache.html. As long as you
 won't
  later be changing the data you want to report on from the past, this will
 work
  wonderfully. They reduce the amount of storage and memory usage needed by
 Analog
  and can be compressed in disk as well.

 And if I did change the format? The data itself wouldn't change - it'd be
 straight log files from Apache 1.3.9. However, stuff that may change in the
 future:

 a) addition of new logs (ie, a referrer log, or error log report).

Then referrer reports will only go back as far as the data does. Analog doesn't
process error reports (they're meant for human consumption).


 b) adding or removing a report from view

Does not affect the cache files.


 c) adding a new subdirectory to specifically watch.

Again, does not affect the cache files.



 I have no intention of messing with inclusions or exclusions. What would
 happen if I changed something that was different with the cache files, and
 Analog still ran? Corrupted data? Ignored cache files? New data only for the
 new cache files, and old data still displayed from the old cache files?
 Nothing?

The only things that affect cache files are inclusions and exclusions (including
those implied by time commands [FROM and TO] and *LOWMEM commands and those
created by changes in logformat). If the data in a cache file is changed, Analog
would still run, but some values may be erroneous (e.g. total count for
previously excluded file, host, browser, etc) and some reports may not have data
as far back as others (e.g. adding referrer data).

HTH,

Jeremy Wadsack
Wadsack-Allen Digital Group



This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] Stupidest Ques. Award Goes To...

2000-04-03 Thread Kim Scarborough

 a) They can't keep the log files right? That'd be crazy?

I don't see why. Log files compress *very* nicely (typically 95%-98%
compression ratios at maximum compression), and Analog has no problem with
compressed log files. Disk space is cheap nowadays, so why would keeping
six years of log files be crazy? I have Analog running on three years'
worth, and I'm sure there's people on this list who've got that beat. 


This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] Stupidest Ques. Award Goes To...

2000-04-03 Thread Aengus Lawlor

Kevin Hemenway wrote:

My problem is thus: I see sites (line Analog's) that have usage stats from 
1994 to the year 2000. And my question is: how is that possible? Questions 
running through my head:

a) They can't keep the log files right? That'd be crazy?

Depends on how busy your site is. If you get 10MB of logs per day, you 
can easily store 5 years of uncompressed log files on a $200 hard drive. 
With compression, you could probably increase that by a factor of 5-10. 
(How much memory you'd need to analyse this, though, I have no idea - 
I've run a couple of reports on about 3G of log files on a system with 
250Mg of RAM).

b) Is it done with cache files? Cache files from 1994 to the year 2000?

It could be, though you loose some detail with cache files, of course. 
In many cases, this loss of detail doesn't really matter, especially for 
data that's 3 years old.

c) How does Analog, in March of 2000, know about the reports from 1994, to 
generate a new monthly report with info from 1994 to the year 2000?

?? How does Analog know about anything? You tell it where to find the 
information. You can tell Analog to include as many logfiles in a single run 
as you want. You can say 

LOGFILE 1998*.LOG 1999*.LOG 2000*.LOG  

if you want

d) Does it have anything to do with OUTPUT COMPUTER? 

No. Read docs/cache.html

And if it does, does that mean that Analog would have to be run twice for 
each config file? One to generate the COMPUTER file (this COMPUTER file - 
how would it remember all the other reports from 1994 onward? or does it?), 
and the one to analyze the COMPUTER file and report everything?

Analog will create an output file and a cache file at the same time, if you 
tell it to. But HTML and COMPUTER are two different OUTPUT types, so you can 
only get one of them per run. CACHE isn't an OUTPUT type.

As you can see, I have no clue. And I'm adverse to installing it until I can 
see it working on paper. 

You'd have had it installed, and run some sample reports in the time it took 
you to write this note.

Basically, this is what I want:

a) Reports generated every week.
b) A monthly report.
c) An overall report.

Nothin' to it...

We've got about 120 virtual domains that would all follow one format (with 
separate reports for each virt.) and then one guy (me) whose annoying and 
likes to tweak everything to death.

The number of different ways you can tweak Analog reports is absolutely 
stunning. Should keep you satisfied for quite some time!


This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/