John M. Dlugosz wrote:
I was going over S02, and found it opens with, By default Perl presents
Unicode in NFG formation, where each grapheme counts as one character.
I looked up NFG, and found it to be an invention of this group, but
didn't find any details when I tried to chase down the
Darren Duncan wrote:
Since you seem eager, I recommend you start with porting the Parrot PDD
28 to a new Perl 6 Synopsis 15, and continue from there.
IMHO we need some people for a broad discussion on the details first.
Helmut Wollmersdorfer
Do we really need to be able to map arbitrary graphemes to integers,
or is it enough to have an opaque value returned by ord() that, when
fed to chr(), returns the same grapheme? If the latter, a list of
code points (in one of the official Normalzation Formats) would seem
to be sufficient.
On
If you haven't read the PDD, it's a good start.
To summarize, probably oversimplifying badly:
1. A grapheme is a character *as seen on the page.* That is, if
composing a + dot above + dot below produces an a with dots above
and below it, then THAT is the grapheme.
2. Unicode has a lot of
On Mon, May 18, 2009 at 9:11 AM, Austin Hastings
austin_hasti...@yahoo.com wrote:
If you haven't read the PDD, it's a good start.
snip useful summary
I get all that, really. I still question the necessity of mapping
each grapheme to a single integer. A single *value*, sure.
Mark J. Reed wrote:
On Mon, May 18, 2009 at 9:11 AM, Austin Hastings
austin_hasti...@yahoo.com wrote:
If you haven't read the PDD, it's a good start.
snip useful summary
I get all that, really. I still question the necessity of mapping
each grapheme to a single integer. A single
On Mon, May 18, 2009 at 07:01:27AM +0200, pugs-comm...@feather.perl6.nl wrote:
: Author: jdlugosz
: Date: 2009-05-18 07:01:27 +0200 (Mon, 18 May 2009)
: New Revision: 26868
:
: Modified:
:docs/Perl6/Spec/S03-operators.pod
: Log:
: Fix one typo, s/know/known/. Really just low-hanging fruit to
On Sun, May 17, 2009 at 09:35:50PM +0200, Moritz Lenz wrote:
: Hi,
:
: t/oo/value_types.t mentions the is value trait, which doesn't appear
: in the spec anywhere. According to the discussion in [1] there was
: speculation about 'is cow' and 'is value', but the former didn't seem to
: enter the
On May 18, 2009, at 09:21 , Mark J. Reed wrote:
If you're doing arithmetic with the code points or scalar values of
characters, then the specific numbers would seem to matter. I'm
I would argue that if you are working with a grapheme cluster
(grapheme), arithmetic on individual grapheme
On Mon, May 18, 2009 at 12:37:49PM -0400, Brandon S. Allbery KF8NH wrote:
On May 18, 2009, at 09:21 , Mark J. Reed wrote:
If you're doing arithmetic with the code points or scalar values of
characters, then the specific numbers would seem to matter. I'm
I would argue that if you are working
On Mon, May 18, 2009 at 12:37:49PM -0400, Brandon S. Allbery KF8NH wrote:
I would argue that if you are working with a grapheme cluster
(grapheme), arithmetic on individual grapheme values is undefined.
Yup, that was exactly what I was arguing.
In short, I think the only remotely sane result
On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote:
[1] Open questions:
1) Will graphemes have an unique charname?
e.g. GRAPHEME LATIN SMALL LETTER A WITH DOT BELOW AND DOT ABOVE
Yes, presumably that comes with the normalization part of NFG.
We're not aiming for
On Mon, May 18, 2009 at 02:16:17PM -0400, Mark J. Reed wrote:
: Surrogates are just weird, since they have assigned code points even
: though they're purely an encoding mechanism. As such, they straddle
: the line between abstract characters and an encoding form. I assume
: that if text comes in
On Sun, May 17, 2009 at 07:41:45PM +0200, Moritz Lenz wrote:
: Hi,
:
: (sorry for yet another p6l email mentioning junctions; if they annoy you
: just ignore this mail :-)
:
: while reviewing some tests I found the each() comprehension in S02
: that evaded my attention so far.
:
: Do we really
On May 18, 2009, at 14:16 , Larry Wall wrote:
On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote:
3) Details of 'life-time', round-trip.
Which is a very interesting topic, with connections to type theory,
scope/domain management, and security issues (such as the possibility
Brandon S. Allbery KF8NH wrote:
On May 18, 2009, at 14:16 , Larry Wall wrote:
On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote:
3) Details of 'life-time', round-trip.
Which is a very interesting topic, with connections to type theory,
scope/domain management, and
Larry Wall wrote:
Which is a very interesting topic, with connections to type theory,
scope/domain management, and security issues (such as the possibility
of a DoS attack on the translation tables).
I think that a DoS attack on Unicode would be called IBM/Windows Code
Pages. The rest of
Author: moritz
Date: 2009-05-18 23:08:54 +0200 (Mon, 18 May 2009)
New Revision: 26876
Modified:
docs/Perl6/Spec/S02-bits.pod
docs/Perl6/Spec/S09-data.pod
Log:
[S02] get rid of the each() comprehension
[S09] document speculative each() junction with grep semantics
Modified:
Mark J. Reed markjreed-at-gmail.com |Perl 6| wrote:
On Mon, May 18, 2009 at 9:11 AM, Austin Hastings
austin_hasti...@yahoo.com wrote:
If you haven't read the PDD, it's a good start.
snip useful summary
I get all that, really. I still question the necessity of mapping
each grapheme
Larry Wall larry-at-wall.org |Perl 6| wrote:
Sure, but this is a weak argument, since you can already write complete
ord/chr nonsense at the codepoint level (even in ASCII), and all we're
doing here is making graphemes work more like codepoints in terms of
storage and indexing. If people abuse
Larry Wall larry-at-wall.org |Perl 6| wrote:
into *uint16 as long as they don't synthesize codepoints. And we can
always resort to *uint32 and *int32 knowing that the Unicode consortium
isn't going to use the top bit any time in the foreseeable future.
(Unless, of course, they endorse something
On Mon, May 18, 2009 at 07:59:31PM -0500, John M. Dlugosz wrote:
No, a few million code points in the Unicode standard can produce an
arbitrary number of unique grapheme clusters, since you can apply as
many modifiers as you like to each different base character. If you
allow multiples,
On May 18, 2009, at 21:54 , Larry Wall wrote:
On Mon, May 18, 2009 at 07:59:31PM -0500, John M. Dlugosz wrote:
No, a few million code points in the Unicode standard can produce an
arbitrary number of unique grapheme clusters, since you can apply as
many modifiers as you like to each different
23 matches
Mail list logo