Re: svn commit: r301461 - in head/lib/libc: gen locale regex

2016-06-06 Thread Pedro Giffuni



On 06/05/16 14:49, Andrey Chernov wrote:

On 05.06.2016 22:12, Pedro F. Giffuni wrote:

--- head/lib/libc/regex/regcomp.c   Sun Jun  5 18:16:33 2016
(r301460)
+++ head/lib/libc/regex/regcomp.c   Sun Jun  5 19:12:52 2016
(r301461)
@@ -821,10 +821,10 @@ p_b_term(struct parse *p, cset *cs)
(void)REQUIRE((uch)start <= (uch)finish, 
REG_ERANGE);
CHaddrange(p, cs, start, finish);
} else {
-   (void)REQUIRE(__collate_range_cmp(table, start, 
finish) <= 0, REG_ERANGE);
+   (void)REQUIRE(__wcollate_range_cmp(table, start, 
finish) <= 0, REG_ERANGE);
for (i = 0; i <= UCHAR_MAX; i++) {
-   if (   __collate_range_cmp(table, start, 
i) <= 0
-   && __collate_range_cmp(table, i, 
finish) <= 0
+   if (   __wcollate_range_cmp(table, start, 
i) <= 0
+   && __wcollate_range_cmp(table, i, 
finish) <= 0
   )
CHadd(p, cs, i);
}



As I already mention in PR, we have broken regcomp after someone adds
wchar_t support there. Now regcomp ranges works only for the first 256
wchars of the current locale, notice that loop upper limit:
for (i = 0; i <= UCHAR_MAX; i++) {
In general, ranges are either broken in regcomp now or are memory
eating. We have bitmask only for the first 256 wchars, all other added
to the range literally. Imagine what happens if someone specify full
Unicode range in regexp.

Proper fix will be adding bitmask for the whole Unicode range, and even
in that case regcomp attempting to use collation in ranges will be
_very_slow_ since needs to check all Unicode chars in its
for (i = 0; i <= Max_Unicode_wchar; i++) {
loop.

Better stop pretending that we are able to do collation support in the
ranges, since POSIX cares about its own locale only here:
"In the POSIX locale, a range expression represents the set of collating
elements that fall between two elements in the collation sequence,
inclusive. In other locales, a range expression has unspecified
behavior: strictly conforming applications shall not rely on whether the
range expression is valid, or on the set of collating elements matched."

Until whole Unicode range bitmask will be implemented (if ever), better
stop pretending to honor collation order, we just can't do it with
wchars now and do what NetBSD/OpenBSD does (using wchar_t) instead. It
does not prevent memory eating on big ranges (bitmask is needed, see
above), but at least fix the thing that only first 256 wchars are
considered.



Sadly regex is one part of the system that could use a maintainer :(,
I have been forced to look at it more than I'd like to but I don't
really use the collation support at all.

Pedro.
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r301461 - in head/lib/libc: gen locale regex

2016-06-05 Thread Andrey Chernov
On 05.06.2016 22:12, Pedro F. Giffuni wrote:
>   When collation support was brought in, the second and third
>   arguments in __collate_range_cmp() were changed from int to
>   wchar_t, breaking the ABI. Change them to a "char" type which
>   makes more sense and keeps the ABI compatible.

Not only that breaks ABI, but changing strcoll_l() to wcscoll_l() in the
__collate_range_cmp() too (now fixed), while this function is visible
outside of libc.

___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r301461 - in head/lib/libc: gen locale regex

2016-06-05 Thread Andrey Chernov
On 05.06.2016 22:12, Pedro F. Giffuni wrote:
> --- head/lib/libc/regex/regcomp.c Sun Jun  5 18:16:33 2016
> (r301460)
> +++ head/lib/libc/regex/regcomp.c Sun Jun  5 19:12:52 2016
> (r301461)
> @@ -821,10 +821,10 @@ p_b_term(struct parse *p, cset *cs)
>   (void)REQUIRE((uch)start <= (uch)finish, 
> REG_ERANGE);
>   CHaddrange(p, cs, start, finish);
>   } else {
> - (void)REQUIRE(__collate_range_cmp(table, start, 
> finish) <= 0, REG_ERANGE);
> + (void)REQUIRE(__wcollate_range_cmp(table, 
> start, finish) <= 0, REG_ERANGE);
>   for (i = 0; i <= UCHAR_MAX; i++) {
> - if (   __collate_range_cmp(table, 
> start, i) <= 0
> - && __collate_range_cmp(table, i, 
> finish) <= 0
> + if (   __wcollate_range_cmp(table, 
> start, i) <= 0
> + && __wcollate_range_cmp(table, i, 
> finish) <= 0
>  )
>   CHadd(p, cs, i);
>   }
> 

As I already mention in PR, we have broken regcomp after someone adds
wchar_t support there. Now regcomp ranges works only for the first 256
wchars of the current locale, notice that loop upper limit:
for (i = 0; i <= UCHAR_MAX; i++) {
In general, ranges are either broken in regcomp now or are memory
eating. We have bitmask only for the first 256 wchars, all other added
to the range literally. Imagine what happens if someone specify full
Unicode range in regexp.

Proper fix will be adding bitmask for the whole Unicode range, and even
in that case regcomp attempting to use collation in ranges will be
_very_slow_ since needs to check all Unicode chars in its
for (i = 0; i <= Max_Unicode_wchar; i++) {
loop.

Better stop pretending that we are able to do collation support in the
ranges, since POSIX cares about its own locale only here:
"In the POSIX locale, a range expression represents the set of collating
elements that fall between two elements in the collation sequence,
inclusive. In other locales, a range expression has unspecified
behavior: strictly conforming applications shall not rely on whether the
range expression is valid, or on the set of collating elements matched."

Until whole Unicode range bitmask will be implemented (if ever), better
stop pretending to honor collation order, we just can't do it with
wchars now and do what NetBSD/OpenBSD does (using wchar_t) instead. It
does not prevent memory eating on big ranges (bitmask is needed, see
above), but at least fix the thing that only first 256 wchars are
considered.

___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"