Here's something similar to what you're already doing except comparing to a
file of "badwords" to look for in the URL's and then emailing you the results.
#!/bin/sh
# filter.sh
#
cd /path/to/filterscript
cat /var/log/squid/access.log | grep -if /path/to/filterscript/badwords >
hits.out
/path/to/filterscript/wordfilter.gawk hits.out
cat /path/to/filterscript/word-report | /bin/mail -s "URL Filter Report" [EMAIL
PROTECTED]
rm hits.out
#!/bin/gawk -f
# wordfilter.gawk
BEGIN {
print "URL Filter Report:" > "/path/to/filterscript/word-report"
print "--------------------------------------" >>
"/path/to/filterscript/word-report"
sp = " -> "
}
{
print strftime("%m-%d-%Y %H:%M:%S",$1), sp, $8 >>
"/path/to/filterscript/word-report"
print $7 >> "/path/to/filterscript/word-report"
print "" >> "/path/to/filterscript/word-report"
}
You may need to adjust the columns printed in the awk script. They're set for
username instead of IP's. Also, you'll need to make a
"/path/to/filterscript/badwords" file with the words/regex you want to search
for....one per line. Someone with better regex skills could probably eliminate
a lot "false" hits with specific patterns in the "badwords" file. I'm using
this in addition to squidGuard and blacklists to catch URL's that were missed
so the output isn't near as large as what you're getting.
Rob
-------------------------------------
Rob Asher
Network Systems Technician
Paragould School District
(870)236-7744 Ext. 169
>>> "Steven Engebretson" <[EMAIL PROTECTED]> 6/11/2008 1:32 PM >>>
I am looking for a tool that will scan the access.log file for pornographic
sites, and will report the specifics back. We do not block access to any
Internet sites, but need to monitor for objectionable content.
What I am doing now is just greping for some key words, and dumping the output
into a file. I am manually going through about 60,000 lines of log file,
following my grep. 99% of these are false. Any help would be appreciated.
Thank you all.
-Steven E.
----------
This message has been scanned for viruses and
dangerous content by the Paragould School District
MailScanner, and is believed to be clean.
----------
This message has been scanned for viruses and
dangerous content by the Paragould School District
MailScanner, and is believed to be clean.