> Andrew Smith wrote: > >>>>Yup, this is what is needed. However, one thing that makes >>>>me wonder is how to deal with sites that fit into more than one >>>>category. > >> If this is not done, then the categories themselves are somewhat >> useless to use - say you want to block a, b, & c for group1 and a, b & >> d for group2 - if a site falls into both c & d but is not listed in >> both lists c & d then that site will get through either group1 or >> group2 where it shouldn't get through either >> i.e. you can't use the categories to do partial filtering unless sites >> are included in all site lists that they are relevant to. >> You would want the duplicates to stay in each list even on your own >> machine to ensure that you could apply a sub-set of lists to each >> group you define. > > I presume squidguard looks at the catergories in the order they > appear in the config file so one awkward point is when someone goes to > a multi blocked site that appears in, say, ads and porn, yet that user > was clearly (to themself) going to a porn site but the rule in ads is > what actually blocked them. A redirected web page giving a > reason for the blockage could be misleading if it goes on about ads > being blocked instead of porn. If this is the case then using various > catergories for user presented info could be unreliable. > > I lean towards pushing all rules into an SQL db and using SQL > statements to sort/filter/update/extract a (perhaps) single set of > regex/urls/domains plain text files with no duplicated rules. Along > these lines would anyone have any comments about this MySQL schema ? > > CREATE TABLE squidguard ( > id INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY, > rule VARCHAR(64) NOT NULL, > type VARCHAR(8) NOT NULL, > cat VARCHAR(16) NOT NULL, > date TIMESTAMP NOT NULL, > hits INT(11), > UNIQUE (rule) > ); > > type = domains | regex | urls > > cat = ads | aggressive | audio-video | drugs | gambling | > hacking | notbusiness | porn | violence | warez | webmail > > hits = optimistic future logging as it would be useful to have some > idea what rules get hit the most and which ones are never hit > > --markc
But this means that someone cannot allow some categories through while filtering the rest since as I stated above, if a page falls into two categories but is only listed in one category that is allowed through, then filtering will not do what it is supposed to do. Really, if you want to filter EVERYTHING (which I do want to do also but that is not relevant to the problem) then just delete the duplicates out of the lists if you don't want duplicates in the list. Maybe a bit more explanation ... Let a = hacking, b = porn, c = notbusiness, d = webmail Let group1 be admins Let group2 be employees If we don't employees looking at hacking, porn or using webmail (they can look at notbusiness if they really want to :-) And we don't want admins looking at hacking, porn and notbusiness (they may need webmail but we don't want them wasting time on notbusiness - their pay rate is too high for that :-) And a site falls into webmail AND notbusiness Then: If it is listed as webmail then admins will be able to access it even though it is also notbusiness If it is listed as notbusiness then employess will be able to access it even though it is webmail So ... duplicates are necessary. -- -Cheers -Andrew MS ... if only he hadn't been hang gliding!
