On Mon, 17 Mar 2014, Ramy Darwish wrote:
Hi Piotr,
In my experience, you need to use a utf8toUnicode transformation to properly
map UTF8 strings to Unicode before the rule processes the input as Unicode,
as well as fine-tune the urlDecodeUni transformation to properly normalize
Central European Unicode characters.
Provide a code point declaration for the urlDecodeUni transformation,
in order to properly normalize Unicode strings (in modsecurity.conf)
---------------------------------------------------
# With the 1250 code point for Central Europe:
SecUnicodeMapFile /etc/modsecurity/unicode.mapping
SecUnicodeCodePage 1250
---------------------------------------------------
See these resources for more info:
http:
//blog.spiderlabs.com/2011/06/modsecurity-advanced-topic-of-the-week-unicode-mapping-support.html
Thanks for the response.
Unfortunately, both You and Ryan Barnett in:
http: //blog.spiderlabs.com/2012/08/waf-normalization-and-i18n.html
are probably missing the point.
This is not the problem with URI normalization/Best-Fit Mapping. This is a
problem with UTF8 multibyte characters, put into rule regexp,
matched with UTF8-unaware PCRE code.
These three multi-byte characters are UTF-8, widely used in
modsecurity_crs_41_sql_injection_attacks.conf:
´’‘
They are used in regex character list: [´’‘], which in effect, is:
[\xc2\xb4\xe2\x80\x99\xe2\x80\x98]. What does it mean? "Match one of these
8 characters from this list". This is documented in audit log entry:
Message: Access denied with code 403 (phase 2).
Pattern match
"(^[\"'`\xc2\xb4\xe2\x80\x99\xe2\x80\x98;]+|[\"'`\xc2\xb4\xe2\x80\x99\xe2\x80\x98;]+$)"
at ARGS:removeins. [file
"/etc/httpd/modsecurity.d/activated_rules/modsecurity_crs_41_sql_injection_attacks.conf"]
[line "64"] [id "981318"] [rev "2"] [msg "SQL Injection Attack: Common Injection
Testing Detected"]
[data "Matched Data: \x99 found within ARGS:removeins: Odinstaluj
aplikacj\xc4\x99"]
[severity "CRITICAL"] [ver "OWASP_CRS/2.2.6"] [maturity "9"] [accuracy "8"]
[tag "OWASP_CRS/WEB_ATTACK/SQL_INJECTION"] [tag "WASCTC/WASC-19"]
[tag "OWASP_TOP_10/A1"] [tag "OWASP_AppSensor/CIE1"] [tag "PCI/6.5.2"]
Setting SecUnicodeCodePage, SecUnicodeMapFile does not change anything.
Despite setting SecUnicodeCodePage 1250, the same rule hits, with the same
"Pattern match".
I have no experience with PCRE in C, but in man pcre:
UTF-8 AND UNICODE PROPERTY SUPPORT
...
In order process UTF-8 strings, you must build PCRE to include UTF-8
support in the code, and, in addition, you must
call pcre_compile() with the PCRE_UTF8 option flag. When you do
this, both the pattern and any subject strings that
are matched against it are treated as UTF-8 strings instead of just
strings of bytes.
There is no sign of PCRE_UTF8 options usage in mod_security code. Bingo!
I wrote four Perl one-liners, to
demonstrate the problem with UTF8-unaware Perl regular expressions:
1. incorrect match as in mod_security/OWASP 981318 rule:
$ perl -e '$str="ę"; print(($str=~/[’]$/ ? "match" : "no match")."\n");'
match
2. the same code, but made regex utf8-aware with 'utf8' module:
$ perl -Mutf8 -e '$str="ę"; print(($str=~/[’]$/ ? "match" : "no match")."\n");'
no match
3. almost self-explaining:
$ perl -Mutf8 -e '$str="’"; print(($str=~/^[’]$/ ? "match" : "no match")."\n");'
match
4. But wait, what if we disable utf8 module? :
$ perl -e '$str="’"; print(($str=~/^[’]$/ ? "match" : "no match")."\n");'
no match
Conclusion: either enable PCRE_UTF8 in mod_security code or modify rules to
reflect PCRE UTF8 unawareness.
Regards,
--
Piotr Gackiewicz
Intertele S.A. - operator systemów ITL.PL i DOMENY.ITL.PL
al. T. Rejtana 10, 35-310 Rzeszów
TEL: +48 17 8507580, FAX: +48 17 8520275
http://www.itl.pl - niezawodne usługi hostingowe
http://domeny.itl.pl - tanie domeny internetowe
http://www.intertele.pl
_______________________________________________
Owasp-modsecurity-core-rule-set mailing list
Owasp-modsecurity-core-rule-set@lists.owasp.org
https://lists.owasp.org/mailman/listinfo/owasp-modsecurity-core-rule-set