SITION BRACKET
Cool idea.
But if you really want to use these characters, your source will be hard
to read without exotic fonts. You have been warned;-)
Helmut Wollmersdorfer
It's not explicitly specified, if a something like
my $charname = 'SPACE';
my $string = "\c[$charname]";
should interpolate or not.
I assume 'not'. Right?
Helmut Wollmersdorfer
asel
LATIN SMALL LETTER A, # some comment
COMBINING DOT BELOW, # thisandthat
]"
Helmut Wollmersdorfer
-time of
the process, but these names would need to be checked for uniqueness
(performance problem).
Helmut Wollmersdorfer
tinue=No)
rakudo: FAIL, std: FAIL
Wouldn't it be easier to reference the Unicode properties
1) ID_Start plus U+005F LOW LINE (=Underscore)
2) ID_Continue
for identifiers? That's what Unicode 'ID_x' is for.
With the nice 'side effect' that combining diacritics are in ID_Continue.
Helmut Wollmersdorfer
AFAIR in two Specs 'CharLingua' appears as - maybe - a leftover from the
history of Perl 6.
Whatever the idea of 'CharLingua' was, something nice-to-have would be
support of locale-dependent processing in the sense of Unicode
http://cldr.unicode.org/
Helmut Wollmersdorfer
ould the definition of graphemes conform to Unicode Standard Annex
#29 'grapheme clusters'? Wich level - legacy, extended or tailored?
Helmut Wollmersdorfer
Darren Duncan wrote:
Since you seem eager, I recommend you start with porting the Parrot PDD
28 to a new Perl 6 Synopsis 15, and continue from there.
IMHO we need some people for a broad discussion on the details first.
Helmut Wollmersdorfer
Larry Wall wrote:
On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote:
2) Can I use Unicode property matching safely with graphemes?
If yes, who or what maintains the necessary tables?
Good question. My assumption is that adding marks to a character
doesn't chang
file,
filters the lines, and writes them back, if the result is in another
normalization form.
Helmut Wollmersdorfer
ly a bug in 'unicore').
2) Syntax of non-boolean properties:
In Perl 5 e.g.
\p{BidiClass:L} # Left-to-Right
\p{gc:L}# General category = Letter
should be in Perl 6 (thx Moritz' suggestion on #perl6):
Helmut Wollmersdorfer
t
in the definition. And if a Unicode term is used it should exactly mean
what is specified in the Unicode standard. E.g. it would be a fault, if
graphemes are defined by '\pX' or '(?>\PM\pM*)', as Unicode provides the
properties 'Grapheme_Base' and 'Grapheme_Extend' (unfortunately they are
not supported by Perl 5 or Perl 6).
Helmut Wollmersdorfer
ical equivalence, both of which really
require locale knowledge outside the charset itself.
Sure. The specs of Perl 6 still need huge work on the Unicode part.
Helmut Wollmersdorfer
ppropriate chapters of the Unicode
standard in the specification of Perl6. This would make Unicode
test-cases reusable. And an implementation should always declare, which
features of Unicode are implemented (and which not) in which version of
Unicode.
Helmut Wollmersdorfer
Larry Wall wrote:
On Sun, Feb 06, 2011 at 08:59:51PM +0100, Helmut Wollmersdorfer wrote:
: Tom Christiansen wrote:
: > I'm also curious whether there are active plans to address the
: > tr18 requirements in perl6 regexes. It would be a wonderful
: > feather in perl6'
thing that plagues us with full Unicode case-folding. This is
the
"\N{LATIN SMALL LIGATURE FFI}" =~ /(f)(f)/i
problem, amongst others. Seems that you are going to get into the
same dilemma if you allow matching partial graphemes in grapheme mode.
We can dream of :ignoreorthography or :ignoretypography, but they should
not be implemented into a regex-engine.
Helmut Wollmersdorfer
16 matches
Mail list logo