Hi khmadhu khmadhu wrote: > I have installed the ossec server on linux. I have a log file which > is generated by a proxy server. its arround 400 MB .i want to generate > the statistics of that log only, like TCP_HIT,TCP_REFRESH etc.. > downloads,top sites,in graph.. >
I routinely generate exactly the kind of reporting you describe (on our Squid logs) using Calamaris - http://cord.de/tools/squid/calamaris/ It is basically a perl program that you pipe logs to using STDIN and it will output in HTML or plain text reports (total HITS, total MB transferred, top X domains, etc). A whole variety of analysis based on what parameters you run it with. I am actually throwing Gigs of logs at it for quarterly reports and such so I have written a few shell scripts around it that use the "-o" and "-i" flags to output a digested stats output (-o) and then import that file (-i) into the scan of the next months log file ... thus I "snowball" my way through the massive log files instead of making calamaris process 1 huge concatenated file. Just have to make sure you set all of the thresholds to infinite "-1" when you are do this so that you don't lose any data before you do the final run (then you can set the threshold to just the top 10/50/100/whatever). eg: "snowballing the daily logs together" # PROCESS 1ST DAY cat /tempdump/access.log_Jan-01 |calamaris -d -1 -s -t -1 -O -c -v \ -o ./data/caldata_Jan-01 > ./reports/report_Jan-01 2>> ./calamaris.err # THEN IMPORT 1ST AND PROCESS 2ND cat /tempdump/access.log_Jan-02 |calamaris -d -1 -s -t -1 -O -c -v \ -i ./data/caldata_Jan-01 -o ./data/caldata_Jan-01_02 \ > ./reports/report_Jan-01_02 2>> ./calamaris.err # THEN IMPORT 1ST+2ND AND PROCESS 3RD cat /tempdump/access.log_Jan-03 |calamaris -d -1 -s -t -1 -O -c -v \ -i ./data/caldata_Jan-01_02 -o ./data/caldata_Jan-01_03 \ > ./reports/report_Jan-01_03 2>> ./calamaris.err # RINSE AND REPEAT FOR WHOLE MONTH Eventually you end up with a "caldata_Jan-01_31" stats digest file (with all stats threshold set infinite). With that file you generate any report you want just by running a "-i" import of it into a calamaris run. eg: "top 100 domains and top 10 TLDs for Jan" calamaris -d 100 -t 10 -z -i data/caldata_Jan-01_31 > Top100_Jan.txt You can run this same report for Q1 by importing the 3 monthly stats digest files for Jan, Feb and Mar (seperated by ":" delimiters). eg: "top 100 domains and top 10 TLDs for 1st Quarter" calamaris -d 100 -t 10 -z -i \ data/caldata_Jan-01_31:data/caldata_Feb-28_31:data/caldata_Jan-01_31 \ > Top100_Q1.txt Of course, all of this import/output stuff is only necessary if you need to analyse very large quantities of logs. If you can send everything in one single STDIN pipe then it can all be done in one command (although I recommend at least doing one big digest output and importing it into your actual report command - that will speed up subsequent reports by saving you redoing all the log parsing) Hope this helps :) Cheers, -Dan
