[perl.git] branch blead, updated. v5.21.9-190-g54bdcd8

Karl Williamson Mon, 09 Mar 2015 10:52:19 -0700

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/54bdcd8ec4c7b2111381943f4fdd4a07d3fe1bf9?hp=8f226aeeda55a51eee04feb4b605d30997d9b592>


- Log -----------------------------------------------------------------
commit 54bdcd8ec4c7b2111381943f4fdd4a07d3fe1bf9
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 9 11:45:13 2015 -0600

    perlrebackslash: Add, correct \b{} text
    
    This fleshes out documentation about this new feature

M       pod/perlrebackslash.pod

commit febd1aee81db64f8e0eaa947896dada407bb7142
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 9 11:43:35 2015 -0600

    perlrebackslash: Nit

M       pod/perlrebackslash.pod

commit b3a7a0153e8fcae963c59b681613d503f5f4a266
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 9 11:41:31 2015 -0600

    perluniprops: Add text about using with older Unicode releases
    
    This pod is generated by mktables.

M       lib/unicore/mktables
-----------------------------------------------------------------------

Summary of changes:
 lib/unicore/mktables    |  4 ++++
 pod/perlrebackslash.pod | 48 ++++++++++++++++++++++++++++++++++--------------
 2 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/lib/unicore/mktables b/lib/unicore/mktables
index e796649..722bc06 100644
--- a/lib/unicore/mktables
+++ b/lib/unicore/mktables
@@ -16185,6 +16185,10 @@ controlling lists contained in the program
 C<\$Config{privlib}>/F<unicore/mktables> and then re-compiling and installing.
 (C<\%Config> is available from the Config module).
 
+Also, perl can be recompiled to operate on an earlier version of the Unicode
+standard.  Further information is at
+C<\$Config{privlib}>/F<unicore/README.perl>.
+
 =head1 Other information in the Unicode data base
 
 The Unicode data base is delivered in two different formats.  The XML version
diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod
index 425299d..b99d803 100644
--- a/pod/perlrebackslash.pod
+++ b/pod/perlrebackslash.pod
@@ -298,7 +298,7 @@ beginning with a "0".
 =head3 Hexadecimal escapes
 
 Like octal escapes, there are two forms of hexadecimal escapes, but both start
-with the same thing, C<\x>.  This is followed by either exactly two hexadecimal
+with the sequence C<\x>.  This is followed by either exactly two hexadecimal
 digits forming a number, or a hexadecimal number of arbitrary length surrounded
 by curly braces. The hexadecimal number is the code point of the character you
 want to express.
@@ -558,8 +558,10 @@ non-word characters nor for string ends.  It may help to 
understand how
     \b really means    (?:(?<=\w)(?!\w)|(?<!\w)(?=\w))
     \B really means    (?:(?<=\w)(?=\w)|(?<!\w)(?!\w))
 
-In contrast, C<\b{...}> may or may not match at the beginning and end of
-the line depending on the boundary type (and C<\B{...}> never does).
+In contrast, C<\b{...}> and C<\B{...}> may or may not match at the
+beginning and end of the line, depending on the boundary type.  These
+implement the Unicode default boundaries, specified in
+L<http://www.unicode.org/reports/tr29/>.
 The boundary types currently available are:
 
 =over
@@ -579,25 +581,41 @@ natural language sentences.  It gives good, but imperfect 
results.  For
 example, it thinks that "Mr. Smith" is two sentences.  More details are
 at L<http://www.unicode.org/reports/tr29/>.  Note also that it thinks
 that anything matching L</\R> (except form feed and vertical tab) is a
-sentence boundary.  This works with word-processor text which line wraps
+sentence boundary.  C<\b{sb}> works with text designed for
+word-processors which wrap lines
 automatically for display, but hard-coded line boundaries are considered
 to be essentially the ends of text blocks (paragraphs really), and hence
-the ends of sententces.  It doesn't well with text containing embedded
-newlines, like the source text of the document you are reading.  Such
-text needs to be preprocessed to get rid of the line separators before
-looking for sentence boundaries.  Some people view this as a bug in the
-Unicode standard.
+the ends of sententces.  C<\b{sb}> doesn't do well with text containing
+embedded newlines, like the source text of the document you are reading.
+Such text needs to be preprocessed to get rid of the line separators
+before looking for sentence boundaries.  Some people view this as a bug
+in the Unicode standard.
 
 =item C<\b{wb}>
 
 This matches a Unicode "Word Boundary".  This gives better (though not
 perfect) results for natural language processing than plain C<\b>
 (without braces) does.  For example, it understands that apostrophes can
-be in the middle of words and that parentheses aren't.   More details
-are at L<http://www.unicode.org/reports/tr29/>.
+be in the middle of words and that parentheses aren't (see the examples
+below).   More details are at L<http://www.unicode.org/reports/tr29/>.
 
 =back
 
+It is important to realize that these are default boundary definitions,
+and that implementations may wish to tailor the results for particular
+purposes and locales.  Also note that Perl gives you the definitions
+valid for the version of the Unicode Standard compiled into Perl.  These
+rules are not considered stable and have been somewhat more subject to
+change than the rest of the Standard, and hence changing to a later Perl
+version may give you a different Unicode version whose changes may not
+be compatibile with what you coded for.  If, necessary, you can
+recompile Perl with an earlier version of the Unicode standard.  More
+information about that is in L<perluniprops/Unicode character properties
+that are NOT accepted by Perl>
+
+Unicode defines a fourth boundary type, accessible through the
+L<Unicode::LineBreak> module.
+
 Mnemonic: I<b>oundary.
 
 =back
@@ -621,10 +639,12 @@ Mnemonic: I<b>oundary.
       print $1;           # Prints 'cat'
   }
 
-  print join "|", "He said, \"Do you care? (I don't).\""
-                                               =~ m/ ( .+? \b{wb} ) /xg;
+  my $s = "He said, \"Is pi 3.14? (I'm not sure).\"";
+  print join("|", $s =~ m/ ( .+? \b     ) /xg), "\n";
+  print join("|", $s =~ m/ ( .+? \b{wb} ) /xg), "\n";
  prints
-  He| |said|,| |"|Do| |you| |care|?| |(|I| |don't|)|.|"
+  He| |said|, "|Is| |pi| |3|.|14|? (|I|'|m| |not| |sure
+  He| |said|,| |"|Is| |pi| |3.14|?| |(|I'm| |not| |sure|)|.|"
 
 =head2 Misc
 

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.21.9-190-g54bdcd8

Reply via email to