Re: [squid-users] Upper limit on the number of regular expressions in url_regex?

2017-08-09 Thread Marcus Kool



On 09/08/17 05:15, Ralf Hildebrandt wrote:

* Marcus Kool :

I have only seen regex failing with such short RE on AIX.
what is your OS, distro, CPU and lib version ?


Ubuntu Linux LTS 16.04 (xenial)
x86_64 (amd64)

I guess you mean libc:
ii  libc6:amd642.23-0ubuntu9


I see no issues with the optimised RE so my first guess is a libc bug.

The RE optimisation in Squid is inspired by the RE optimisation in ufdbGuard.
ufdbGuard optimises the RE a bit different and it looks like this:
zizicamarda.com/7fg3g|zizzhaida.com/3m6ij|zizzhaida.com/98g4ubq|...
I have tested this optimised RE on Ubuntu 16.04 and it works so maybe it is not 
a libc bug but a Squid bug.


BTW: why use regular expressions for a list of 1+ _fixed_ URLs ?


What is the alternative?


ufdbGuard is a URL filter that converts a file with 1 URLs to a database 
file that is optimised for fast lookups.
So all you need to do is configure a URL rewriter and you can filter those 
URLs, using fixed URLs not REs.

Marcus

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Upper limit on the number of regular expressions in url_regex?

2017-08-09 Thread Ralf Hildebrandt
* Marcus Kool :
> I have only seen regex failing with such short RE on AIX.
> what is your OS, distro, CPU and lib version ?

Ubuntu Linux LTS 16.04 (xenial)
x86_64 (amd64)

I guess you mean libc:
ii  libc6:amd642.23-0ubuntu9

> BTW: why use regular expressions for a list of 1+ _fixed_ URLs ?

What is the alternative?
 
-- 
Ralf Hildebrandt   Charite Universitätsmedizin Berlin
ralf.hildebra...@charite.deCampus Benjamin Franklin
https://www.charite.de Hindenburgdamm 30, 12203 Berlin
Geschäftsbereich IT, Abt. Netzwerk fon: +49-30-450.570.155
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Upper limit on the number of regular expressions in url_regex?

2017-08-08 Thread Ralf Hildebrandt
* Ralf Hildebrandt :

> But why is it failing?

I reordered the file

sort -r /etc/squid5/generated-rw_urlbl.acl > 
/etc/squid5/generated-rw_urlbl.acl.new
mv /etc/squid5/generated-rw_urlbl.acl.new /etc/squid5/generated-rw_urlbl.acl

and reconfigured squid:

2017/08/08 16:27:50.463| 28,2| RegexData.cc(212) compileOptimisedREs: 10775 REs 
are optimised into one RE.
2017/08/08 16:27:50.463| 28,2| RegexData.cc(214) compileOptimisedREs: 
/etc/squid5/squid.conf line 1710: acl rw_urlbl url_regex 
"/etc/squid5/generated-rw_urlbl.acl"
2017/08/08 16:27:50.463| 28,2| RegexData.cc(216) compileOptimisedREs: WARNING: 
there are more than 100 regular expressions. Consider using less REs or use 
rules without expressions like 'dstdomain'.

and it's working...

-- 
Ralf Hildebrandt   Charite Universitätsmedizin Berlin
ralf.hildebra...@charite.deCampus Benjamin Franklin
https://www.charite.de Hindenburgdamm 30, 12203 Berlin
Geschäftsbereich IT, Abt. Netzwerk fon: +49-30-450.570.155
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


[squid-users] Upper limit on the number of regular expressions in url_regex?

2017-08-08 Thread Ralf Hildebrandt
I'm using this in squid-5.0:

acl markRw_urlbl annotate_transaction accessRule=rw_urlbl
acl rw_urlbl url_regex "/etc/squid5/generated-rw_urlbl.acl"
http_access deny rw_urlbl markRw_urlbl
deny_info   http://proxy.charite.de/rw_urlbl/ markRw_urlbl
# https://ransomwaretracker.abuse.ch/blocklist/ 30.3.16 RHI

And yes, it's quite big:

# wc -l /etc/squid5/generated-rw_urlbl.acl
10783 /etc/squid5/generated-rw_urlbl.acl

During reconfigure I noticed:

2017/08/08 15:56:45.413| WARNING: optimisation of regular expressions failed; 
using fallback method without optimisation

Now I increased debug_options (to 28,9) and found that squid is
repeatedly grouping the regular expressions until a buffer is "full",
the last such log entry is:

2017/08/08 15:56:45.413| 28,2| RegexData.cc(188) compileOptimisedREs: adding RE 
'http://zzzort10xtest123.com/nin5k3bwo'
2017/08/08 15:56:45.413| 28,2| RegexData.cc(194) compileOptimisedREs: buffer 
full, generating new optimised RE...
2017/08/08 15:56:45.413| 28,2| RegexData.cc(125) compileRE: compiled 
'(http://zizicamarda.com/7fg3g)|(http://zizzhaida.com/3m6ij)|(http://zizzhaida.com/98g4ubq)|(http://zizzhaida.com/a0s9b)|(http://zjscs.org/oax
qpo4w7)|(http://zlotysalmo.net/0zx0ken3)|(http://zlotysalmo.net/3v8va8ov)|(http://zlotysalmo.net/75vepy6f)|(http://zlotysalmo.net/9v50aob)|(http://znany-lekarz.pl/wd7zj)|(http://zoekeith.com/qehggefyb)|(http://z
ona-sezona.com.ua/hj1lsp)|(http://zonabest.atspace.com/353wxy)|(http://zonnit.com/qargy9n)|(http://zoologiczny.cba.pl/okp987g7v)|(http://zoomwalls.com/k8j3tpoe)|(http://zoomwalls.com/zghpzv2f)|(http://zoonhers.n
et/3oojm4)|(http://zoonhers.net/4susie)|(http://zoonhers.net/5ngvr)|(http://zophotos.com/098tb)|(http://zorgboerderijtzicht.nl/lm3mhz)|(http://zpwang.net/9igbmnn)|(http://zsgxbgj.com/1324w)|(http://zsnbystre.rep
ublika.pl/988g765f)|(http://zsp17.y0.pl/jkYTFhb7)|(http://zsz_szyn.republika.pl/G7vuYhjb)|(http://zuerich-gewerbe.ch/mbv58gbv)|(http://zui9reica.web.fc2.com/87hcrn33g)|(http://zurrmax.de/hwajuip)|(http://zwei.au
dio/87h78rf33g)|(http://zwljfc.com/8765r)|(http://zyasf.com/cir9dl)|(http://zytrade.cn/1324w)|(http://zytrade.cn/aust7a6ik)|(http://zzzort10xtest123.com/nin5k3bwo)'
with flags 9

http://zzzort10xtest123.com/nin5k3bwo being the last line in the file
/etc/squid5/generated-rw_urlbl.acl 

The last compilation seems to fail, and the next line in the log is:

2017/08/08 15:56:45.413| WARNING: optimisation of regular expressions failed; 
using fallback method without optimisation

whereupon each line becomes it's own RE:

2017/08/08 15:56:45.430| 28,2| RegexData.cc(125) compileRE: compiled 
'http://5ik.rcomhost.com/7fg3g' with flags 9
2017/08/08 15:56:45.431| 28,2| RegexData.cc(125) compileRE: compiled 
'http://01ad681.netsolhost.com/7j0jlq3' with flags 9
2017/08/08 15:56:45.431| 28,2| RegexData.cc(125) compileRE: compiled 
'http://023pc.cn/8hrnv3' with flags 9
2017/08/08 15:56:45.431| 28,2| RegexData.cc(125) compileRE: compiled 
'http://027tzx.com/lscpv' with flags 9
...

But why is it failing?

Background:
===

Running squid with > 1 regular expressions causes all kinds of
strange behaviour - that'S why I noticed the problem in the first place.

-- 
Ralf Hildebrandt   Charite Universitätsmedizin Berlin
ralf.hildebra...@charite.deCampus Benjamin Franklin
https://www.charite.de Hindenburgdamm 30, 12203 Berlin
Geschäftsbereich IT, Abt. Netzwerk fon: +49-30-450.570.155
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users