Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Helmut Wollmersdorfer
John M. Dlugosz wrote: I was going over S02, and found it opens with, By default Perl presents Unicode in NFG formation, where each grapheme counts as one character. I looked up NFG, and found it to be an invention of this group, but didn't find any details when I tried to chase down the

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Helmut Wollmersdorfer
Darren Duncan wrote: Since you seem eager, I recommend you start with porting the Parrot PDD 28 to a new Perl 6 Synopsis 15, and continue from there. IMHO we need some people for a broad discussion on the details first. Helmut Wollmersdorfer

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Mark J. Reed
Do we really need to be able to map arbitrary graphemes to integers, or is it enough to have an opaque value returned by ord() that, when fed to chr(), returns the same grapheme? If the latter, a list of code points (in one of the official Normalzation Formats) would seem to be sufficient. On

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Austin Hastings
If you haven't read the PDD, it's a good start. To summarize, probably oversimplifying badly: 1. A grapheme is a character *as seen on the page.* That is, if composing a + dot above + dot below produces an a with dots above and below it, then THAT is the grapheme. 2. Unicode has a lot of

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Mark J. Reed
On Mon, May 18, 2009 at 9:11 AM, Austin Hastings austin_hasti...@yahoo.com wrote: If you haven't read the PDD, it's a good start. snip useful summary I get all that, really. I still question the necessity of mapping each grapheme to a single integer. A single *value*, sure.

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Austin Hastings
Mark J. Reed wrote: On Mon, May 18, 2009 at 9:11 AM, Austin Hastings austin_hasti...@yahoo.com wrote: If you haven't read the PDD, it's a good start. snip useful summary I get all that, really. I still question the necessity of mapping each grapheme to a single integer. A single

Re: r26868 - docs/Perl6/Spec

2009-05-18 Thread Larry Wall
On Mon, May 18, 2009 at 07:01:27AM +0200, pugs-comm...@feather.perl6.nl wrote: : Author: jdlugosz : Date: 2009-05-18 07:01:27 +0200 (Mon, 18 May 2009) : New Revision: 26868 : : Modified: :docs/Perl6/Spec/S03-operators.pod : Log: : Fix one typo, s/know/known/. Really just low-hanging fruit to

Re: is value trait

2009-05-18 Thread Larry Wall
On Sun, May 17, 2009 at 09:35:50PM +0200, Moritz Lenz wrote: : Hi, : : t/oo/value_types.t mentions the is value trait, which doesn't appear : in the spec anywhere. According to the discussion in [1] there was : speculation about 'is cow' and 'is value', but the former didn't seem to : enter the

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Brandon S. Allbery KF8NH
On May 18, 2009, at 09:21 , Mark J. Reed wrote: If you're doing arithmetic with the code points or scalar values of characters, then the specific numbers would seem to matter. I'm I would argue that if you are working with a grapheme cluster (grapheme), arithmetic on individual grapheme

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Larry Wall
On Mon, May 18, 2009 at 12:37:49PM -0400, Brandon S. Allbery KF8NH wrote: On May 18, 2009, at 09:21 , Mark J. Reed wrote: If you're doing arithmetic with the code points or scalar values of characters, then the specific numbers would seem to matter. I'm I would argue that if you are working

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Mark J. Reed
On Mon, May 18, 2009 at 12:37:49PM -0400, Brandon S. Allbery KF8NH wrote: I would argue that if you are working with a grapheme cluster (grapheme), arithmetic on individual grapheme values is undefined. Yup, that was exactly what I was arguing. In short, I think the only remotely sane result

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Larry Wall
On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote: [1] Open questions: 1) Will graphemes have an unique charname? e.g. GRAPHEME LATIN SMALL LETTER A WITH DOT BELOW AND DOT ABOVE Yes, presumably that comes with the normalization part of NFG. We're not aiming for

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Larry Wall
On Mon, May 18, 2009 at 02:16:17PM -0400, Mark J. Reed wrote: : Surrogates are just weird, since they have assigned code points even : though they're purely an encoding mechanism. As such, they straddle : the line between abstract characters and an encoding form. I assume : that if text comes in

Re: each() comprehension

2009-05-18 Thread Larry Wall
On Sun, May 17, 2009 at 07:41:45PM +0200, Moritz Lenz wrote: : Hi, : : (sorry for yet another p6l email mentioning junctions; if they annoy you : just ignore this mail :-) : : while reviewing some tests I found the each() comprehension in S02 : that evaded my attention so far. : : Do we really

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Brandon S. Allbery KF8NH
On May 18, 2009, at 14:16 , Larry Wall wrote: On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote: 3) Details of 'life-time', round-trip. Which is a very interesting topic, with connections to type theory, scope/domain management, and security issues (such as the possibility

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Austin Hastings
Brandon S. Allbery KF8NH wrote: On May 18, 2009, at 14:16 , Larry Wall wrote: On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote: 3) Details of 'life-time', round-trip. Which is a very interesting topic, with connections to type theory, scope/domain management, and

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Austin Hastings
Larry Wall wrote: Which is a very interesting topic, with connections to type theory, scope/domain management, and security issues (such as the possibility of a DoS attack on the translation tables). I think that a DoS attack on Unicode would be called IBM/Windows Code Pages. The rest of

r26876 - docs/Perl6/Spec

2009-05-18 Thread pugs-commits
Author: moritz Date: 2009-05-18 23:08:54 +0200 (Mon, 18 May 2009) New Revision: 26876 Modified: docs/Perl6/Spec/S02-bits.pod docs/Perl6/Spec/S09-data.pod Log: [S02] get rid of the each() comprehension [S09] document speculative each() junction with grep semantics Modified:

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread John M. Dlugosz
Mark J. Reed markjreed-at-gmail.com |Perl 6| wrote: On Mon, May 18, 2009 at 9:11 AM, Austin Hastings austin_hasti...@yahoo.com wrote: If you haven't read the PDD, it's a good start. snip useful summary I get all that, really. I still question the necessity of mapping each grapheme

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread John M. Dlugosz
Larry Wall larry-at-wall.org |Perl 6| wrote: Sure, but this is a weak argument, since you can already write complete ord/chr nonsense at the codepoint level (even in ASCII), and all we're doing here is making graphemes work more like codepoints in terms of storage and indexing. If people abuse

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread John M. Dlugosz
Larry Wall larry-at-wall.org |Perl 6| wrote: into *uint16 as long as they don't synthesize codepoints. And we can always resort to *uint32 and *int32 knowing that the Unicode consortium isn't going to use the top bit any time in the foreseeable future. (Unless, of course, they endorse something

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Larry Wall
On Mon, May 18, 2009 at 07:59:31PM -0500, John M. Dlugosz wrote: No, a few million code points in the Unicode standard can produce an arbitrary number of unique grapheme clusters, since you can apply as many modifiers as you like to each different base character. If you allow multiples,

Re: Unicode in 'NFG' formation ?

2009-05-18 Thread Brandon S. Allbery KF8NH
On May 18, 2009, at 21:54 , Larry Wall wrote: On Mon, May 18, 2009 at 07:59:31PM -0500, John M. Dlugosz wrote: No, a few million code points in the Unicode standard can produce an arbitrary number of unique grapheme clusters, since you can apply as many modifiers as you like to each different