Hi khmadhu

khmadhu wrote:
> I have installed the ossec server on linux. I have  a log file which
> is generated by a proxy server. its arround 400 MB .i want to generate
> the statistics of that log only, like TCP_HIT,TCP_REFRESH etc..
> downloads,top sites,in graph..
> 

I routinely generate exactly the kind of reporting you describe (on our 
Squid logs) using Calamaris - http://cord.de/tools/squid/calamaris/

It is basically a perl program that you pipe logs to using STDIN and it 
will output in HTML or plain text reports (total HITS, total MB 
transferred, top X domains, etc). A whole variety of analysis based on 
what parameters you run it with.

I am actually throwing Gigs of logs at it for quarterly reports and such 
so I have written a few shell scripts around it that use the "-o" and 
"-i" flags to output a digested stats output (-o) and then import that 
file (-i) into the scan of the next months log file ... thus I 
"snowball" my way through the massive log files instead of making 
calamaris process 1 huge concatenated file. Just have to make sure you 
set all of the thresholds to infinite "-1" when you are do this so that 
you don't lose any data before you do the final run (then you can set 
the threshold to just the top 10/50/100/whatever).

eg: "snowballing the daily logs together"

# PROCESS 1ST DAY
cat /tempdump/access.log_Jan-01 |calamaris -d -1 -s -t -1 -O -c -v \
-o ./data/caldata_Jan-01 > ./reports/report_Jan-01 2>> ./calamaris.err
# THEN IMPORT 1ST AND PROCESS 2ND
cat /tempdump/access.log_Jan-02 |calamaris -d -1 -s -t -1 -O -c -v \
-i ./data/caldata_Jan-01 -o ./data/caldata_Jan-01_02 \
 > ./reports/report_Jan-01_02 2>> ./calamaris.err
# THEN IMPORT 1ST+2ND AND PROCESS 3RD
cat /tempdump/access.log_Jan-03 |calamaris -d -1 -s -t -1 -O -c -v \
-i ./data/caldata_Jan-01_02 -o ./data/caldata_Jan-01_03 \
 > ./reports/report_Jan-01_03 2>> ./calamaris.err
# RINSE AND REPEAT FOR WHOLE MONTH

Eventually you end up with a "caldata_Jan-01_31" stats digest file (with 
all stats threshold set infinite). With that file you generate any 
report you want just by running a "-i" import of it into a calamaris run.

eg: "top 100 domains and top 10 TLDs for Jan"

calamaris -d 100 -t 10 -z -i data/caldata_Jan-01_31 > Top100_Jan.txt


You can run this same report for Q1 by importing the 3 monthly stats 
digest files for Jan, Feb and Mar (seperated by ":" delimiters).

eg: "top 100 domains and top 10 TLDs for 1st Quarter"

calamaris -d 100 -t 10 -z -i \
data/caldata_Jan-01_31:data/caldata_Feb-28_31:data/caldata_Jan-01_31 \
 > Top100_Q1.txt


Of course, all of this import/output stuff is only necessary if you need 
to analyse very large quantities of logs. If you can send everything in 
one single STDIN pipe then it can all be done in one command (although I 
recommend at least doing one big digest output and importing it into 
your actual report command - that will speed up subsequent reports by 
saving you redoing all the log parsing)

Hope this helps :)

Cheers,
-Dan

Reply via email to