draft

chromatic Tue, 01 Apr 2008 19:01:26 -0700

Author: chromatic
Date: Tue Apr  1 19:01:13 2008
New Revision: 26698

Modified:
   trunk/docs/pdds/draft/pdd28_character_sets.pod


Log:
[PDD] Typo fixes and minor formatting nits.

Modified: trunk/docs/pdds/draft/pdd28_character_sets.pod
==============================================================================
--- trunk/docs/pdds/draft/pdd28_character_sets.pod      (original)
+++ trunk/docs/pdds/draft/pdd28_character_sets.pod      Tue Apr  1 19:01:13 2008
@@ -29,7 +29,7 @@
 The Unicode Standard prefers the concepts of I<character repertoire> (a
 collection of characters) and I<character code> (a mapping which tells you what
 number represents which character in the repertoire). Character set is commonly
-used to mean the standard which defines both a repertoire and a code. 
+used to mean the standard which defines both a repertoire and a code.
 
 =head2 Codepoint
 
@@ -38,7 +38,7 @@
 
 =head2 Encoding
 
-An encoding determines how a codepoint is represented inside a computer. 
+An encoding determines how a codepoint is represented inside a computer.
 Simple encodings like ASCII define that the codepoints 0-127 simply
 live as their numeric equivalents inside an eight-bit bytes. Other
 fixed-width encodings like UTF-16 use more bytes to encode more
@@ -65,9 +65,9 @@
 etc), including any modifiers (diacritics, etc).
 
 The Unicode Standard defines a I<grapheme cluster> (commonly simplified to just
-I<graheme>) as one or more characters forming a visible whole when displayed,
+I<grapheme>) as one or more characters forming a visible whole when displayed,
 in other words, a bundle of a character and all of its combining characters.
-Since graphemes are the highest-level abstract idea of a "character", they're
+Because graphemes are the highest-level abstract idea of a "character", they're
 useful for converting between character sets.
 
 =head2 Normalization Form
@@ -98,7 +98,7 @@
 
 =item *
 
-Parrot provides an interface for interacting with strings and converting 
+Parrot provides an interface for interacting with strings and converting
 between character sets and encodings.
 
 =item *
@@ -130,7 +130,7 @@
 string encodings inside Parrot. (Producers of Parrot strings can do whatever is
 most efficient for them.) To put it in simple terms: if you find yourself
 writing C<*s++> or any other C string idioms, you need to stop and think if
-that's what you really mean. Not everything is byte-based any more.
+that's what you really mean. Not everything is byte-based anymore.
 
 =head2 Grapheme Normalization Form
 
@@ -147,7 +147,7 @@
 
 String operations on this kind of variable-byte encoding can be complex and
 expensive. Operations like comparison and traversal require a series of
-computations and lookaheads, since any given grapheme may be a sequence of
+computations and lookaheads, because any given grapheme may be a sequence of
 combining characters. The Unicode Standard defines several "normalization
 forms" that help with this problem. Normalization Form C (NFC), for example,
 decomposes everything, then re-composes as much as possible. So if you see the
@@ -161,8 +161,8 @@
 means that even in the most normalized Unicode form, string manipulation code
 must always assume a variable-byte encoding, and use expensive lookaheads. The
 cost is incurred on every operation, though the particular string operated on
-might not contain combining characters. It's particularly noticable in parsing
-and regular expression matches, where backtracking operations may retraverse
+might not contain combining characters. It's particularly noticeable in parsing
+and regular expression matches, where backtracking operations may re-traverse
 the characters of a simple string hundreds of times.
 
 In order to reduce the cost of variable-byte operations and simplify some
@@ -243,22 +243,22 @@
                    push @grapheme_table, "\x{438}\x{30F}";
                    ~ $#grapheme_table;
                 });
-   push @string, $codepoint; 
+   push @string, $codepoint;
 
 =head2 String API
 
 Strings have the following structure:
 
   struct parrot_string_t {
-      UnionVal                cache;
-      Parrot_UInt             flags;
-      char                   *strstart;
-      UINTVAL                 bufused;
-      UINTVAL                 strlen;
-      const struct _encoding *encoding;
-      const struct _charset  *charset;
+      UnionVal                      cache;
+      Parrot_UInt                   flags;
+      UINTVAL                       bufused;
+      UINTVAL                       hashval;
+      UINTVAL                       strlen;
+      char                         *strstart;
+      const struct _encoding       *encoding;
+      const struct _charset        *charset;
       const struct _normalization  *normalization;
-      UINTVAL                 hashval;
   };
 
 Deprecation note: the enum C<parrot_string_representation_t> will be removed.
@@ -270,7 +270,7 @@
 
 Conversion will be done with a function called C<string_grapheme_copy>:
 
-    INTVAL string_grapheme_copy(STRING* src, STRING* dst)
+    INTVAL string_grapheme_copy(STRING *src, STRING *dst)
 
 Converting a string from one format to another involves creating a new empty
 string with the required attributes, and passing the source string and the new

[svn:parrot-pdd] r26698 - trunk/docs/pdds/draft

Reply via email to