Andrew Smith wrote:

>>>Yup, this is what is needed.  However, one thing that makes 
>>>me wonder is how to deal with sites that fit into more than one
>>>category.

> If this is not done, then the categories themselves are somewhat useless
> to use - say you want to block a, b, & c for group1 and a, b & d for
> group2 - if a site falls into both c & d but is not listed in both
> lists c & d then that site will get through either group1 or group2
> where it shouldn't get through either
> i.e. you can't use the categories to do partial filtering unless sites
> are included in all site lists that they are relevant to.
> You would want the duplicates to stay in each list even on your own
> machine to ensure that you could apply a sub-set of lists to each
> group you define.

I presume squidguard looks at the catergories in the order they
appear in the config file so one awkward point is when someone goes
to a multi blocked site that appears in, say, ads and porn, yet that
user was clearly (to themself) going to a porn site but the rule in
ads is what actually blocked them. A redirected web page giving a
reason for the blockage could be misleading if it goes on about ads
being blocked instead of porn. If this is the case then using various
catergories for user presented info could be unreliable.

I lean towards pushing all rules into an SQL db and using SQL statements
to sort/filter/update/extract a (perhaps) single set of regex/urls/domains
plain text files with no duplicated rules. Along these lines would anyone
have any comments about this MySQL schema ?

CREATE TABLE squidguard (
  id INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  rule VARCHAR(64) NOT NULL,
  type VARCHAR(8) NOT NULL,
  cat VARCHAR(16) NOT NULL,
  date TIMESTAMP NOT NULL,
  hits INT(11),
  UNIQUE (rule)
);

type = domains | regex | urls

cat =  ads | aggressive | audio-video | drugs | gambling |
        hacking | notbusiness | porn | violence | warez | webmail

hits = optimistic future logging as it would be useful to have some
        idea what rules get hit the most and which ones are never hit

--markc

Reply via email to