On 08-06-2009 at 16:17:58 Michał Wysoczański <mic...@wysoczanski.net> wrote:

I've tried to use PHPTAL with templates with some long text nodes (i.e.
<p>long text here...</p>),
and noticed that PHPTAL generates internal server error (500) (tested on
1.2.0b3 and svn HEAD).
Error is generated by preg_match() call in checkEncoding() method of
PHPTAL_Dom_SaxXmlParser class (line 373). It seems that PCRE fails with long
strings. The exact
length of string depends on machine I've tested on, but 9.5k should be
enough to generate error.
I know, that this is not PCRE issue, not PHPTAL, but maybe PHPTAL should
perform encoding checking in
other way?

Probably it would help to "invert" the expression, i.e. instead of matching entire valid UTF-8 string, use expression that matches only invalid UTF-8 strings.

Try it if you like:

            // http://www.w3.org/International/questions/qa-forms-utf-8
            $match = '[\x09\x0A\x0D\x20-\x7F]'        // ASCII
. '|[\xC2-\xDF][\x80-\xBF]' // non-overlong 2-byte . '|\xE0[\xA0-\xBF][\x80-\xBF]' // excluding overlongs . '|[\xE1-\xEC\xEE\xEE][\x80-\xBF]{2}' // straight 3-byte (exclude FFFE and FFFF)
               . '|\xEF[\x80-\xBE][\x80-\xBF]'        // straight 3-byte
               . '|\xEF\xBF[\x80-\xBD]'               // straight 3-byte
. '|\xED[\x80-\x9F][\x80-\xBF]' // excluding surrogates
               . '|\xF0[\x90-\xBF][\x80-\xBF]{2}'     // planes 1-3
               . '|[\xF1-\xF3][\x80-\xBF]{3}'         // planes 4-15
               . '|\xF4[\x80-\x8F][\x80-\xBF]{2}';    // plane 16

In the meantime I've added not-so-pretty workaround that splits string into small chunks and checks each one individually.

See if that helps:


regards, Kornel

PHPTAL mailing list

Reply via email to