[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-06-01 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 Philip Hazel changed: What|Removed |Added Resolution|--- |FIXED

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-17 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #19 from Zoltan Herczeg --- Yes, the standard is quite strictly defined: https://www.ecma-international.org/ecma-262/6.0/#sec-patterns -- You are receiving this mail because: You are on the CC list for the bug. --

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-17 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #18 from Philip Hazel --- I have just committed a documented patch that implements additional options within the compile context, the first of which is PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES. This is available in UTF-8

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-17 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #17 from Rob --- I appreciate the conceptual difficulty of trying to reconcile two incompatible paradigms. >From my perspective, the only change i need is that it doesnt throw an error. i rewrote your escape handler so

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-11 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #15 from Rob --- Thanks Philip, i look forward to that 'permissive' option -- You are receiving this mail because: You are on the CC list for the bug. -- ## List details at

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-11 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #14 from Philip Hazel --- >From many years of experience, I've learned that it's better to start off strict rather than permissive. It bites you less and you can always relax later. However, in the case of \o, Perl

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-11 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #13 from Zoltan Herczeg --- Philip is working on a pattern conversion utility for PCRE2 at the moment. It can convert posix or glob patterns to PCRE2 syntax. Perhaps JS could be a valid target as well. -- You are

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-11 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #12 from Rob --- Hi Zoltan and thanks for the amazing work on SLJIT. PCRE passes Javascript's test262 unit tests well enough to be suitable for my browser app, and i can easily fix the problems i found so far by tweaking a

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-11 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #11 from Zoltan Herczeg --- I worked on JS regex before and unfortunately there are major differences between JS regex and Perl regex. Especially on parsing side. So achieving 100% compatibility is a very hard task. --

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-10 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #10 from Rob --- I think the issue isnt "Javascript compatibility" so much as "strict versus permissive". Javascript is historically permissive, whereas PCRE seems strict by design. So if you add another option,

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-10 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 Zoltan Herczeg changed: What|Removed |Added CC||hzmes...@freemail.hu ---

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-10 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #8 from Philip Hazel --- I wrote "If a subject string has been checked by pcre2_match() for UTF-8 correctness, the only occurrences of surrogates must be in valid pairs." Of course, I *meant* to write "for UTF-16

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-10 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #7 from Philip Hazel --- You wrote "Checking the surrogate range seems too benign to be error-worthy." The problem is that, in UTF-16, it is impossible to encode a character value (code point) within that range because

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-09 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #5 from Rob --- Thanks for replying to my issue. i'll try to clarify ... I'm using PCRE2 in a Javascript interpreter for a web browser. Viewing some pages on the New York Times website caused the Javascript interpreter to

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-09 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #4 from Philip Hazel --- Thanks for the comments. I was thinking about this overnight, and had second thoughts about it, along the lines of what Christian says. I think we need to know exactly what is the problem here.

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-08 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 Giuseppe D'Angelo changed: What|Removed |Added CC||dange...@gmail.com ---

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-08 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #2 from Christian Persch (GNOME) --- >From my POV as a PCRE user, I think this would be rather surprising. NO_UTF_CHECK is used when you've previously ensured that strings you pass to PCRE are valid UTF, so the extra check

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

2017-05-08 Thread admin
https://bugs.exim.org/show_bug.cgi?id=2120 --- Comment #1 from Philip Hazel --- I've just looked at pcre2_valid_utf.c, and it contains only 399 lines. So I'm afraid I'm confused. Aha! You mean line 1473 in pcre2_compile.c. OK, I see what you mean. NO_UTF_CHECK was