Hi,

I think the cost of an early check is still lower than doing this during 
pcre_exec().

Anyway, PCRE have macros in pcre_internal.h:

GETCHAR(c, eptr) GETCHARTEST(c, eptr) GETCHARINC(c, eptr)
GETCHARINCTEST(c, eptr) GETCHARLEN(c, eptr, len)
GETCHARLENTEST(c, eptr, len) BACKCHAR(eptr)

You can redefine them to fit for your purposes including handling illegal 
characters.

Regards,
Zoltan

ND <[email protected]> írta:
>> In the default case, PCRE does not crash: it returns PCRE_ERROR_BADUTF8.>
This output is non-useful when the main application needs to analyze input  >
stream no matter what. To do this the main application now is forced to:>
   have its own built-in UTF8-parser;>
   reparse the input stream by this built-in parser to find invalid UTF-8  >
characters;>
   make them valid and remember changes to have possibility to restore them  >
later;>
   reexecute pcre_exec() with valid UTF-8 stream;>
   rebuild output stream with restoring of replaced invalid UTF-8  >
characters.>
And cost of this work is very high.>
>
Situations when analyzis must be successfully dealed regardles erroneous  >
or not is input UTF-8 stream are widespread. The reason of error  >
appearance in some cases is unwitting or wilful in other. Now PCRE can't  >
offer effective solution.>
>
> I think it would penalize the normal running of PCRE too much.>
I wrote that this behaviour may be OPTIONAL.>
>
> I also think one could argue about how to interpret a sequence of  >
> invalid byteswhose values are greater than 127. How many characters does  >
> such astring encode? For example, suppose the first byte indicates that  >
> thereare three more bytes in a UTF-8 character, two of them are OK, but  >
> thethird one has an invalid value (less than 128, say). Is that a mangled >
> UTF-8 character followed by an ASCII byte, or is it four single-byte>
> characters?>
IMHO a sequence of invalid bytes may be interprets as one character of  >
type "invalid" per byte.>
>
-- >
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev >


-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to