Great script and advices indeed.
Now the other problem, running this 'cleaning' script takes now an
estimated 40 minutes on my proxy
machine (60Mb of logs, result of 2 very active days, normally a week).
Should I perhaps rotate the squidlogs
then? And run this script daily in a crontab on the freshly rotated log
only? I think this would be a solution,
any other ideas?

Thanks,
Endre.




                                                                                       
                                           
                    Kirk Schneider                                                     
                                           
                    <[EMAIL PROTECTED]        To:     Endre Szekely-Bencedi <[EMAIL 
PROTECTED]>                       
                    theon.com>             cc:     [EMAIL PROTECTED]                   
                                 
                                           Subject:     Re: [squid-users] Calamaris    
                                           
                    03/01/2004                                                         
                                           
                    07:11 PM                                                           
                                           
                                                                                       
                                           
                                                                                       
                                           




Endre,

I have contacted the Calamaris author before on this and he has
suggested filtering the extra fields that smartfilter adds at
the end.

Now I run this on all my logs before piping to calamaris:

awk '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10}' access.log |calamaris


--
Kirk Schneider                          972-952-4645 (work)
Raytheon Corporate IT Security          214-912-8679 (cell)
[EMAIL PROTECTED]                 888-431-7621 (pager)

"If you think the problem is bad now just wait until we've solved it."



-------- Original Message --------
Subject: [squid-users] Calamaris
Date: Mon, 1 Mar 2004 17:43:52 +0100
From: Endre Szekely-Bencedi <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]

Hello List,

I have a problem with Calamaris (v2.58).

I am using squid 2.5stable3, compiled from sources, with SmartFilter
plugin.
As far as I know, I have to use the squid-extended input type for this. But
this will give some errors:

[EMAIL PROTECTED] logs]# date;cat test.log | /usr/local/squid/bin/calamaris
-f squid-extended -F html > /var/www/html/calamaris2.html;date
Mon Mar  1 17:44:08 CET 2004
Malformed UTF-8 character (unexpected non-continuation byte 0x31,
immediately after start byte 0xf3) in split at (eval 1) line 20, <> line
369578.
Malformed UTF-8 character (unexpected non-continuation byte 0x31,
immediately after start byte 0xf3) in split at (eval 1) line 20, <> line
369578.
Split loop at (eval 1) line 20, <> line 369578.
Mon Mar  1 17:48:05 CET 2004
[EMAIL PROTECTED] logs]#

Generated log shows:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html;
charset=iso-8859-1"></HEAD>
<BODY></BODY></HTML>

Which is an empty page.

A sample from the logfile:

1077780471.441     93 3.227.65.74 TCP_MISS/302 476 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780471.466     64 3.227.65.74 TCP_MISS/200 1722 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.479     72 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780471.508     59 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780471.699     73 3.227.65.74 TCP_MISS/200 1585 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.713     83 3.227.65.74 TCP_MISS/200 1607 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.726     86 3.227.65.74 TCP_MISS/200 1589 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.885    256 3.227.65.74 TCP_MISS/200 726 GET
http://as.fotexnet.hu/adserver.ads/153/0///937480 -
DEFAULT_PARENT/10.20.20.254 text/ht
ml text/html ALLOW
1077780473.212    229 3.227.65.74 TCP_MISS/200 23713 GET
http://index.hu/ad/lipton/banner1_120x240.swf? -
DEFAULT_PARENT/10.20.20.254 applicat
ion/x-shockwave-flash application/x-shockwave-flash ALLOW Portal Sites
1077780473.298     72 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.388    279 3.227.65.74 TCP_MISS/200 17697 GET
http://index.hu/ad/microsoft_wss.swf? - DEFAULT_PARENT/10.20.20.254
application/x-sho
ckwave-flash application/x-shockwave-flash ALLOW Portal Sites
1077780473.439    106 3.227.65.74 TCP_MISS/302 476 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.458     47 3.227.65.74 TCP_MISS/302 476 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.480    368 3.227.65.74 TCP_MISS/200 4292 GET
http://as.fotexnet.hu/adserver.ads/196/0///27236 -
DEFAULT_PARENT/10.20.20.254 text/ht
ml text/html ALLOW
1077780473.643    162 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.646    144 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.673    487 3.227.65.74 TCP_MISS/200 10319 GET
http://as.fotexnet.hu/adserver.ads/200/0///378158 -
DEFAULT_PARENT/10.20.20.254 text/
html text/html ALLOW
1077780473.799    280 3.227.65.74 TCP_MISS/200 26216 GET
http://index.hu/ad/teluzoallo_120x240.swf? - DEFAULT_PARENT/10.20.20.254
application/
x-shockwave-flash application/x-shockwave-flash ALLOW Portal Sites
1077780473.819    122 3.227.65.74 TCP_MISS/200 216 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Porta
l Sites
1077780473.824    124 3.227.65.74 TCP_MISS/200 355 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Porta
l Sites
1077780473.842    136 3.227.65.74 TCP_MISS/200 1603 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780473.846     47 3.227.65.74 TCP_MISS/200 353 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Porta
l Sites

Am I doing something wrong?

Thanks,
Endre.




"THIS E-MAIL MESSAGE ALONG WITH ANY ATTACHMENTS IS INTENDED ONLY FOR THE
ADDRESSEE and may contain confidential and privileged information. If the
reader of this message is not the intended recipient, you are notified that
any dissemination, distribution or copy of this communication is strictly
prohibited. If you have received this message by error, please notify us
immediately, return the original mail to the sender and delete the message
from your system."

Reply via email to