Unicode Sets in 'Unicode Regular Expressions'

2014-05-27 Thread Richard Wordingham
UTS#18 'Unicode Regular Expressions' Version 17 Requirement RL1.3 'Subtraction and Intersection' talks of Unicode sets. What is the relevant definition of a 'Unicode set'? Is it a finite set of non-empty strings? Other possibilities that occur to me, depending on context, include sets of

RE: Unicode Sets in 'Unicode Regular Expressions'

2014-05-27 Thread Phillips, Addison
A Unicode set in this context means a set of code points. This is discussed in section 1.2: -- This is done by providing syntax for sets of characters based on the Unicode character properties, and allowing them to be mixed with lists and ranges of individual code points. -- More generally,

Re: Unicode Sets in 'Unicode Regular Expressions'

2014-05-27 Thread Charlie Ruland ☘
This is from the introduction http://www.unicode.org/reports/tr18/#Introductionto UTS#18: “Unicode is a large character set—[...]” So I take “Unicode set” to mean “set of Unicode characters” with their respective codepoints, whether decomposable or not. Charlie ☘ Richard Wordingham

Re: Unicode Sets in 'Unicode Regular Expressions'

2014-05-27 Thread Richard Wordingham
On Wed, 28 May 2014 00:56:40 +0200 Charlie Ruland ☘ rul...@luckymail.com wrote: So I take “Unicode set” to mean “set of Unicode characters” with their respective codepoints, whether decomposable or not. The decomposability issue arises when trying to follow RL2.1 Canonical Equivalence. In a

Re: Unicode Sets in 'Unicode Regular Expressions'

2014-05-27 Thread Mark Davis ☕️
They are defined in http://unicode.org/reports/tr35/tr35.html#Unicode_Sets. We should add a pointer to that; could you please file a feedback report for #18 to that effect? Also, if you find any problems in the description in #35, you can file a ticket at http://unicode.org/cldr/trac/newticket to