Change 33192: Integrate:

Nicholas Clark Sat, 02 Feb 2008 09:00:26 -0800

Change 33192 by [EMAIL PROTECTED] on 2008/02/02 16:53:33

        Integrate:
        [ 32461]
        Subject: [patch] :utf8 updates
        From: Juerd Waalboer <[EMAIL PROTECTED]>
        Date: Sat, 17 Nov 2007 20:03:00 +0100
        Message-ID: <[EMAIL PROTECTED]>
        
        [ 32462]
        Bump $open::VERSION
        
        [ 32493]
        Subject: [PATCH] pod/perlrebackslash.pod: small Unicode additions
        From: Jarkko Hietaniemi <[EMAIL PROTECTED]>
        Date: Mon, 26 Nov 2007 04:55:03 +0200 (EET)
        Message-Id: <[EMAIL PROTECTED]>
        
        [ 32575]
        Document some environment variables that might affect tests,
        by Robin Barker.
        
        [ 32581]
        Note that Larry has clarified the reasons for the Perl 6 design on
        state assignments.
        
        [ 32584]
        Subject: [PATCH] perltodo.pod: add -D_FORTIFY_SOURCE and 
-fstack-protector
        From: Jarkko Hietaniemi <[EMAIL PROTECTED]>
        Date: Thu, 6 Dec 2007 05:07:26 +0200 (EET)
        Message-Id: <[EMAIL PROTECTED]>
        
        [ 32587]
        Documentation patch on filetests, the filetest pragma, and the
        special _ filehandle, largely based on :
        
        Subject: Re: [perl #46463] filetests sometimes do not set _
        From: Mark Overmeer <[EMAIL PROTECTED]>
        Date: Fri, 30 Nov 2007 11:38:20 +0100
        Message-ID: <[EMAIL PROTECTED]>
        
        [ 32588]
        Revert change 32171 per Jarkko's request
        
        [ 32591]
        Fix a typo found by Matt Kraai, and a reference to Herakles.
        
        [ 32592]
        Subject: Change /etc/passed to /etc/passwd in filetest.pm
        From: Matt Kraai <[EMAIL PROTECTED]>
        Date: Fri, 7 Dec 2007 01:09:22 -0800
        Message-ID: <[EMAIL PROTECTED]>
        
        [ 32593]
        Fix some typos, some found by Matt Kraai
        
        [ 32606]
        Subject: Re: Remove extra space from perltodo.pod
        From: Matt Kraai <[EMAIL PROTECTED]>
        Date: Fri, 7 Dec 2007 07:28:06 -0800
        Message-ID: <[EMAIL PROTECTED]>
        
        [ 32609]
        Some things a pumpking should not forget when releasing a new perl
        
        [ 32612]
        Subject: [perl #48214] documentation enhancement to perlthrtut 
        From: [EMAIL PROTECTED] (via RT) <[EMAIL PROTECTED]>
        Date: Wed, 05 Dec 2007 09:45:40 -0800
        Message-ID: <[EMAIL PROTECTED]>
        
        [ 32614]
        FAQ sync
        
        [ 32617]
        Shuffle sections (no text changes)
        
        [ 32618]
        Mention NO_MATHOMS in INSTALL
        
        [ 32622]
        Subject: [PATCH] 2 year old email tweak
        From: Richard Foley <[EMAIL PROTECTED]>
        Date: Sun, 16 Dec 2007 12:31:04 +0100
        Message-Id: <[EMAIL PROTECTED]>
        
        [ 32624]
        PerlFAQ sync
        
        [ 32626]
        Subject: pod-patch:  reword O.pm
        From: jimc <[EMAIL PROTECTED]>
        Date: Thu, 13 Dec 2007 15:55:07 -0700
        Message-ID: <[EMAIL PROTECTED]>
        
        [ 32627]
        Update AUTHORS
        
        [ 32636]
        Notes on 5.12 Unicode revamping planned.
        Complete the "reporting bug" section of perldelta.
        
        [ 32638]
        5.10.0 is planned for today.
        
        [ 32640]
        Two more people in AUTHORS


Affected files ...

... //depot/maint-5.8/perl/AUTHORS#49 integrate
... //depot/maint-5.8/perl/INSTALL#50 integrate
... //depot/maint-5.8/perl/Porting/pumpkin.pod#17 integrate
... //depot/maint-5.8/perl/README#13 integrate
... //depot/maint-5.8/perl/ext/B/O.pm#2 integrate
... //depot/maint-5.8/perl/lib/PerlIO.pm#15 integrate
... //depot/maint-5.8/perl/lib/filetest.pm#6 integrate
... //depot/maint-5.8/perl/lib/open.pm#12 integrate
... //depot/maint-5.8/perl/pod/perlcheat.pod#4 integrate
... //depot/maint-5.8/perl/pod/perlfaq1.pod#17 integrate
... //depot/maint-5.8/perl/pod/perlfaq4.pod#29 integrate
... //depot/maint-5.8/perl/pod/perlfunc.pod#108 integrate
... //depot/maint-5.8/perl/pod/perlhack.pod#29 integrate
... //depot/maint-5.8/perl/pod/perlhist.pod#40 integrate
... //depot/maint-5.8/perl/pod/perlopentut.pod#10 integrate
... //depot/maint-5.8/perl/pod/perlrebackslash.pod#3 integrate
... //depot/maint-5.8/perl/pod/perlrun.pod#66 integrate
... //depot/maint-5.8/perl/pod/perlthrtut.pod#11 integrate
... //depot/maint-5.8/perl/pod/perltodo.pod#39 integrate
... //depot/maint-5.8/perl/pod/perlunicode.pod#31 integrate
... //depot/maint-5.8/perl/pod/perlunifaq.pod#2 integrate
... //depot/maint-5.8/perl/pod/perluniintro.pod#19 integrate
... //depot/maint-5.8/perl/pod/perlunitut.pod#3 integrate

Differences ...

==== //depot/maint-5.8/perl/AUTHORS#49 (text) ====
Index: perl/AUTHORS
--- perl/AUTHORS#48~32381~      2007-11-17 12:50:15.000000000 -0800
+++ perl/AUTHORS        2008-02-02 08:53:33.000000000 -0800
@@ -87,6 +87,7 @@
 Bas van Sisseren               <[EMAIL PROTECTED]>
 Beau Cox
 Ben Carter                      <[EMAIL PROTECTED]>
+Ben Morrow                     <[EMAIL PROTECTED]>
 Ben Tilly                      <[EMAIL PROTECTED]>
 Benjamin Goldberg              <[EMAIL PROTECTED]>
 Benjamin Holzman               <[EMAIL PROTECTED]>
@@ -477,7 +478,7 @@
 Joost van Baal                 <[EMAIL PROTECTED]>
 JT McDuffie                    <[EMAIL PROTECTED]>
 Juan Gallego                   <[EMAIL PROTECTED]>
-Juerd Waalboer                 <[EMAIL PROTECTED]>
+Juerd Waalboer                 <[EMAIL PROTECTED]>
 Juha Laiho                     <[EMAIL PROTECTED]>
 Julian Yip                     <[EMAIL PROTECTED]>
 juna                            <[EMAIL PROTECTED]>
@@ -564,6 +565,7 @@
 Mark Leighton Fisher           <[EMAIL PROTECTED]>
 Mark Mielke                    <[EMAIL PROTECTED]>
 Mark Murray                    <[EMAIL PROTECTED]>
+Mark Overmeer                  <[EMAIL PROTECTED]>
 Mark P. Lutz                   <[EMAIL PROTECTED]>
 Mark Pease                     <[EMAIL PROTECTED]>
 Mark Pizzolato                 <[EMAIL PROTECTED]>
@@ -589,6 +591,7 @@
 Mathieu Arnold                 <[EMAIL PROTECTED]>
 Mats Peterson                  <[EMAIL PROTECTED]>
 Matt Kimball
+Matt Kraii                     <[EMAIL PROTECTED]>
 Matt Sergeant                  <[EMAIL PROTECTED]>
 Matt Taggart                    <[EMAIL PROTECTED]>
 Matthew Black                  <[EMAIL PROTECTED]>
@@ -721,13 +724,14 @@
 Raymund Will                   <[EMAIL PROTECTED]>
 Redvers Davies                 <[EMAIL PROTECTED]>
 Reini Urban                    <[EMAIL PROTECTED]>
+Renee Baecker                  <[EMAIL PROTECTED]>
 Rex Dieter                     <[EMAIL PROTECTED]>
 Ricardo SIGNES                 <[EMAIL PROTECTED]>
 Rich Morin                     <[EMAIL PROTECTED]>
 Rich Salz                      <[EMAIL PROTECTED]>
 Richard A. Wells               <[EMAIL PROTECTED]>
 Richard Clamp                  <[EMAIL PROTECTED]>
-Richard Foley                  <[EMAIL PROTECTED]>
+Richard Foley                  <[EMAIL PROTECTED]>
 Richard Hatch                  <[EMAIL PROTECTED]>
 Richard Hitt                   <[EMAIL PROTECTED]>
 Richard Kandarian              <[EMAIL PROTECTED]>
@@ -781,6 +785,7 @@
 Sébastien Aperghis-Tramoni     <[EMAIL PROTECTED]>
 Sebastien Barre                        <[EMAIL PROTECTED]>
 Sebastian Steinlechner          <[EMAIL PROTECTED]>
+Shawn                          <[EMAIL PROTECTED]>
 Sherm Pendley                  <[EMAIL PROTECTED]>
 Shigeya Suzuki                 <[EMAIL PROTECTED]>
 Shimpei Yamashita              <[EMAIL PROTECTED]>
@@ -840,6 +845,7 @@
 Thomas Dorner                  <[EMAIL PROTECTED]>
 Thomas Kofler
 Thomas König
+Thomas Pfau                     <[EMAIL PROTECTED]>
 Thomas Wegner                  <[EMAIL PROTECTED]>
 Thorsten Glaser
 Tim Adye                       <[EMAIL PROTECTED]>

==== //depot/maint-5.8/perl/INSTALL#50 (text) ====
Index: perl/INSTALL
--- perl/INSTALL#49~32568~      2007-12-04 05:46:34.000000000 -0800
+++ perl/INSTALL        2008-02-02 08:53:33.000000000 -0800
@@ -2079,6 +2079,83 @@
 See also L<"Maintaining completely separate versions"> for another
 approach.
 
+=head1 cd /usr/include; h2ph *.h sys/*.h
+
+Some perl scripts need to be able to obtain information from the
+system header files.  This command will convert the most commonly used
+header files in /usr/include into files that can be easily interpreted
+by perl.  These files will be placed in the architecture-dependent
+library ($archlib) directory you specified to Configure.
+
+Note:  Due to differences in the C and perl languages, the conversion
+of the header files is not perfect.  You will probably have to
+hand-edit some of the converted files to get them to parse correctly.
+For example, h2ph breaks spectacularly on type casting and certain
+structures.
+
+=head1 installhtml --help
+
+Some sites may wish to make perl documentation available in HTML
+format.  The installhtml utility can be used to convert pod
+documentation into linked HTML files and install them.
+
+Currently, the supplied ./installhtml script does not make use of the
+html Configure variables.  This should be fixed in a future release.
+
+The following command-line is an example of one used to convert
+perl documentation:
+
+  ./installhtml                   \
+      --podroot=.                 \
+      --podpath=lib:ext:pod:vms   \
+      --recurse                   \
+      --htmldir=/perl/nmanual     \
+      --htmlroot=/perl/nmanual    \
+      --splithead=pod/perlipc     \
+      --splititem=pod/perlfunc    \
+      --libpods=perlfunc:perlguts:perlvar:perlrun:perlop \
+      --verbose
+
+See the documentation in installhtml for more details.  It can take
+many minutes to execute a large installation and you should expect to
+see warnings like "no title", "unexpected directive" and "cannot
+resolve" as the files are processed. We are aware of these problems
+(and would welcome patches for them).
+
+You may find it helpful to run installhtml twice. That should reduce
+the number of "cannot resolve" warnings.
+
+=head1 cd pod && make tex && (process the latex files)
+
+Some sites may also wish to make the documentation in the pod/ directory
+available in TeX format.  Type
+
+       (cd pod && make tex && <process the latex files>)
+
+=head1 Starting all over again
+
+If you wish to re-build perl from the same build directory, you should
+clean it out with the command
+
+       make distclean
+
+or
+
+       make realclean
+
+The only difference between the two is that make distclean also removes
+your old config.sh and Policy.sh files.
+
+If you are upgrading from a previous version of perl, or if you
+change systems or compilers or make other significant changes, or if
+you are experiencing difficulties building perl, you should not re-use
+your old config.sh.
+
+If your reason to reuse your old config.sh is to save your particular
+installation choices, then you can probably achieve the same effect by
+using the Policy.sh file.  See the section on L<"Site-wide Policy
+settings"> above.
+
 =head1 Reporting Problems
 
 Wherever possible please use the perlbug tool supplied with this Perl
@@ -2232,83 +2309,6 @@
 incomplete) list of locally installed modules.  Note that you want
 perllocal.pod, not perllocale.pod, for installed module information.
 
-=head1 cd /usr/include; h2ph *.h sys/*.h
-
-Some perl scripts need to be able to obtain information from the
-system header files.  This command will convert the most commonly used
-header files in /usr/include into files that can be easily interpreted
-by perl.  These files will be placed in the architecture-dependent
-library ($archlib) directory you specified to Configure.
-
-Note:  Due to differences in the C and perl languages, the conversion
-of the header files is not perfect.  You will probably have to
-hand-edit some of the converted files to get them to parse correctly.
-For example, h2ph breaks spectacularly on type casting and certain
-structures.
-
-=head1 installhtml --help
-
-Some sites may wish to make perl documentation available in HTML
-format.  The installhtml utility can be used to convert pod
-documentation into linked HTML files and install them.
-
-Currently, the supplied ./installhtml script does not make use of the
-html Configure variables.  This should be fixed in a future release.
-
-The following command-line is an example of one used to convert
-perl documentation:
-
-  ./installhtml                   \
-      --podroot=.                 \
-      --podpath=lib:ext:pod:vms   \
-      --recurse                   \
-      --htmldir=/perl/nmanual     \
-      --htmlroot=/perl/nmanual    \
-      --splithead=pod/perlipc     \
-      --splititem=pod/perlfunc    \
-      --libpods=perlfunc:perlguts:perlvar:perlrun:perlop \
-      --verbose
-
-See the documentation in installhtml for more details.  It can take
-many minutes to execute a large installation and you should expect to
-see warnings like "no title", "unexpected directive" and "cannot
-resolve" as the files are processed. We are aware of these problems
-(and would welcome patches for them).
-
-You may find it helpful to run installhtml twice. That should reduce
-the number of "cannot resolve" warnings.
-
-=head1 cd pod && make tex && (process the latex files)
-
-Some sites may also wish to make the documentation in the pod/ directory
-available in TeX format.  Type
-
-       (cd pod && make tex && <process the latex files>)
-
-=head1 Starting all over again
-
-If you wish to re-build perl from the same build directory, you should
-clean it out with the command
-
-       make distclean
-
-or
-
-       make realclean
-
-The only difference between the two is that make distclean also removes
-your old config.sh and Policy.sh files.
-
-If you are upgrading from a previous version of perl, or if you
-change systems or compilers or make other significant changes, or if
-you are experiencing difficulties building perl, you should not re-use
-your old config.sh.
-
-If your reason to reuse your old config.sh is to save your particular
-installation choices, then you can probably achieve the same effect by
-using the Policy.sh file.  See the section on L<"Site-wide Policy
-settings"> above.
-
 =head1 Minimizing the Perl installation
 
 The following section is meant for people worrying about squeezing the
@@ -2450,6 +2450,13 @@
 (The 'strace' is Linux-specific, other similar utilities include 'truss'
 and 'ktrace'.)
 
+=head2 C<-DNO_MATHOMS>
+
+If you configure perl with C<-Accflags=-DNO_MATHOMS>, the functions from
+F<mathoms.c> will not be compiled in. Those functions are no longer used
+by perl itself; for source compatibility reasons, though, they weren't
+completely removed.
+
 =head1 DOCUMENTATION
 
 Read the manual entries before running perl.  The main documentation

==== //depot/maint-5.8/perl/Porting/pumpkin.pod#17 (text) ====
Index: perl/Porting/pumpkin.pod
--- perl/Porting/pumpkin.pod#16~31143~  2007-05-04 12:37:09.000000000 -0700
+++ perl/Porting/pumpkin.pod    2008-02-02 08:53:33.000000000 -0800
@@ -609,6 +609,15 @@
 If you update the subversion number in F<patchlevel.h>, you may need
 to change the version number near the top of the F<Changes> file.
 
+=head2 Bumping perl's version
+
+If you bump perl's version, you will need to update a few things:
+the L<perlhist> manpage for the date of release, the version number and
+perldelta reference in the top level F<README> (and maybe the copyright
+year too), the F<META.yml> file (generated via F<Porting/makemeta>, be
+sure to run it with the current bleadperl), and the meta-info about
+dual-lived modules in Module::Corelist (F<Porting/corelist.pl> does that).
+
 =head2 Todo
 
 The F<pod/perltodo.pod> file contains a roughly-categorized unordered

==== //depot/maint-5.8/perl/README#13 (text) ====
Index: perl/README
--- perl/README#12~32437~       2007-11-21 09:50:11.000000000 -0800
+++ perl/README 2008-02-02 08:53:33.000000000 -0800
@@ -21,6 +21,9 @@
 
 For an introduction to the language's features, see pod/perlintro.pod.
 
+For a discussion of the important changes in this release, see
+pod/perl588delta.pod.  (This will also be installed as perldelta.pod).
+
 There are also many Perl books available, covering a wide variety of topics,
 from various publishers.  See pod/perlbook.pod for more information.
 

==== //depot/maint-5.8/perl/ext/B/O.pm#2 (text) ====
Index: perl/ext/B/O.pm
--- perl/ext/B/O.pm#1~17645~    2002-07-19 12:29:57.000000000 -0700
+++ perl/ext/B/O.pm     2008-02-02 08:53:33.000000000 -0800
@@ -107,16 +107,15 @@
 
     use O ("Backend", OPTIONS);
 
-The C<import> function which that calls loads in the appropriate
-C<B::Backend> module and calls the C<compile> function in that
-package, passing it OPTIONS. That function is expected to return
-a sub reference which we'll call CALLBACK. Next, the "compile-only"
-flag is switched on (equivalent to the command-line option C<-c>)
-and a CHECK block is registered which calls CALLBACK. Thus the main
-Perl program mentioned on the command-line is read in, parsed and
-compiled into internal syntax tree form. Since the C<-c> flag is
-set, the program does not start running (excepting BEGIN blocks of
-course) but the CALLBACK function registered by the compiler
+The C<O::import> function loads the appropriate C<B::Backend> module
+and calls its C<compile> function, passing it OPTIONS. That function
+is expected to return a sub reference which we'll call CALLBACK. Next,
+the "compile-only" flag is switched on (equivalent to the command-line
+option C<-c>) and a CHECK block is registered which calls
+CALLBACK. Thus the main Perl program mentioned on the command-line is
+read in, parsed and compiled into internal syntax tree form. Since the
+C<-c> flag is set, the program does not start running (excepting BEGIN
+blocks of course) but the CALLBACK function registered by the compiler
 backend is called.
 
 In summary, a compiler backend module should be called "B::Foo"

==== //depot/maint-5.8/perl/lib/PerlIO.pm#15 (text) ====
Index: perl/lib/PerlIO.pm
--- perl/lib/PerlIO.pm#14~30764~        2007-03-26 10:20:04.000000000 -0700
+++ perl/lib/PerlIO.pm  2008-02-02 08:53:33.000000000 -0800
@@ -139,6 +139,10 @@
        $in = <F>;
        close(F);
 
+Note that this layer does not validate byte sequences. For reading
+input, using C<:encoding(utf8)> instead of bare C<:utf8>, is strongly
+recommended.
+
 =item :bytes
 
 This is the inverse of C<:utf8> layer. It turns off the flag

==== //depot/maint-5.8/perl/lib/filetest.pm#6 (text) ====
Index: perl/lib/filetest.pm
--- perl/lib/filetest.pm#5~20693~       2003-08-13 22:35:13.000000000 -0700
+++ perl/lib/filetest.pm        2008-02-02 08:53:33.000000000 -0800
@@ -1,6 +1,6 @@
 package filetest;
 
-our $VERSION = '1.01';
+our $VERSION = '1.02';
 
 =head1 NAME
 
@@ -21,32 +21,78 @@
 permission operators, C<-r> C<-w> C<-x> C<-R> C<-W> C<-X>
 (see L<perlfunc>).
 
-The default behaviour is to use the mode bits as returned by the stat()
-family of calls.  This, however, may not be the right thing to do if
-for example various ACL (access control lists) schemes are in use.
+The default behaviour of file test operators is to use the simple
+mode bits as returned by the stat() family of system calls.  However,
+many operating systems have additional features to define more complex
+access rights, for example ACLs (Access Control Lists).
 For such environments, C<use filetest> may help the permission
 operators to return results more consistent with other tools.
 
-Each "use filetest" or "no filetest" affects statements to the end of
-the enclosing block.
+The C<use filetest> or C<no filetest> statements affect file tests defined in
+their block, up to the end of the closest enclosing block (they are lexically
+block-scoped).
+
+Currently, only the C<access> sub-pragma is implemented.  It enables (or
+disables) the use of access() when available, that is, on most UNIX systems and
+other POSIX environments.  See details below.
+
+=head2 Consider this carefully
+
+The stat() mode bits are probably right for most of the files and
+directories found on your system, because few people want to use the
+additional features offered by access(). But you may encounter surprises
+if your program runs on a system that uses ACLs, since the stat()
+information won't reflect the actual permissions.
+
+There may be a slight performance decrease in the filetest operations
+when the filetest pragma is in effect, because checking bits is very
+cheap.
 
-There may be a slight performance decrease in the filetests
-when C<use filetest> is in effect, because in some systems
-the extended functionality needs to be emulated.
-
-B<NOTE>: using the file tests for security purposes is a lost cause
+Also, note that using the file tests for security purposes is a lost cause
 from the start: there is a window open for race conditions (who is to
 say that the permissions will not change between the test and the real
 operation?).  Therefore if you are serious about security, just try
 the real operation and test for its success - think in terms of atomic
-operations.
+operations.  Filetests are more useful for filesystem administrative
+tasks, when you have no need for the content of the elements on disk.
+
+=head2 The "access" sub-pragma
+
+UNIX and POSIX systems provide an abstract access() operating system call,
+which should be used to query the read, write, and execute rights. This
+function hides various distinct approaches in additional operating system
+specific security features, like Access Control Lists (ACLs)
+
+The extended filetest functionality is used by Perl only when the argument
+of the operators is a filename, not when it is a filehandle.
+
+=head2 Limitation with regard to C<_>
+
+Because access() does not invoke stat() (at least not in a way visible
+to Perl), B<the stat result cache "_" is not set>.  This means that the
+outcome of the following two tests is different.  The first has the stat
+bits of C</etc/passwd> in C<_>, and in the second case this still
+contains the bits of C</etc>.
+
+ { -d '/etc';
+   -w '/etc/passwd';
+   print -f _ ? 'Yes' : 'No';   # Yes
+ }
+
+ { use filetest 'access';
+   -d '/etc';
+   -w '/etc/passwd';
+   print -f _ ? 'Yes' : 'No';   # No
+ }
+
+Of course, unless your OS does not implement access(), in which case the
+pragma is simply ignored.  Best not to use C<_> at all in a file where
+the filetest pragma is active!
 
-=head2 subpragma access
+As a side effect, as C<_> doesn't work, stacked filetest operators
+(C<-f -w $file>) won't work either.
 
-Currently only one subpragma, C<access> is implemented.  It enables
-(or disables) the use of access() or similar system calls.  This
-extended filetest functionality is used only when the argument of the
-operators is a filename, not when it is a filehandle.
+This limitation might be removed in a future version of perl.
 
 =cut
 

==== //depot/maint-5.8/perl/lib/open.pm#12 (text) ====
Index: perl/lib/open.pm
--- perl/lib/open.pm#11~26775~  2006-01-10 10:19:43.000000000 -0800
+++ perl/lib/open.pm    2008-02-02 08:53:33.000000000 -0800
@@ -3,7 +3,7 @@
 use Carp;
 $open::hint_bits = 0x20000; # HINT_LOCALIZE_HH
 
-our $VERSION = '1.05';
+our $VERSION = '1.06';
 
 require 5.008001; # for PerlIO::get_layers()
 
@@ -80,11 +80,7 @@
                    unless defined $locale_encoding;
                (warnings::warnif("layer", "Cannot figure out an encoding to 
use"), last)
                    unless defined $locale_encoding;
-               if ($locale_encoding =~ /^utf-?8$/i) {
-                   $layer = "utf8";
-               } else {
-                   $layer = "encoding($locale_encoding)";
-               }
+                $layer = "encoding($locale_encoding)";
                $std = 1;
            } else {
                my $target = $layer;            # the layer name itself
@@ -152,7 +148,7 @@
 
     use open IO  => ':locale';
 
-    use open ':utf8';
+    use open ':encoding(utf8)';
     use open ':locale';
     use open ':encoding(iso-8859-7)';
 
@@ -194,8 +190,8 @@
 
 These are equivalent
 
-    use open ':utf8';
-    use open IO => ':utf8';
+    use open ':encoding(utf8)';
+    use open IO => ':encoding(utf8)';
 
 as are these
 
@@ -211,9 +207,6 @@
 many encodings have several aliases.  See L<Encode::Supported> for
 details and the list of supported locales.
 
-Note that C<:utf8> PerlIO layer must always be specified exactly like
-that, it is not subject to the loose matching of encoding names.
-
 When open() is given an explicit list of layers (with the three-arg
 syntax), they override the list declared using this pragma.
 
@@ -221,10 +214,10 @@
 the C<:utf8> or C<:encoding> subpragmas, it converts the standard
 filehandles (STDIN, STDOUT, STDERR) to comply with encoding selected
 for input/output handles.  For example, if both input and out are
-chosen to be C<:utf8>, a C<:std> will mean that STDIN, STDOUT, and
-STDERR are also in C<:utf8>.  On the other hand, if only output is
-chosen to be in C<< :encoding(koi8r) >>, a C<:std> will cause only the
-STDOUT and STDERR to be in C<koi8r>.  The C<:locale> subpragma
+chosen to be C<:encoding(utf8)>, a C<:std> will mean that STDIN, STDOUT,
+and STDERR are also in C<:encoding(utf8)>.  On the other hand, if only
+output is chosen to be in C<< :encoding(koi8r) >>, a C<:std> will cause
+only the STDOUT and STDERR to be in C<koi8r>.  The C<:locale> subpragma
 implicitly turns on C<:std>.
 
 The logic of C<:locale> is described in full in L<encoding>,

==== //depot/maint-5.8/perl/pod/perlcheat.pod#4 (text) ====
Index: perl/pod/perlcheat.pod
--- perl/pod/perlcheat.pod#3~32260~     2007-11-09 14:47:44.000000000 -0800
+++ perl/pod/perlcheat.pod      2008-02-02 08:53:33.000000000 -0800
@@ -84,7 +84,7 @@
 
 =head1 AUTHOR
 
-Juerd Waalboer <[EMAIL PROTECTED]>, with the help of many Perl Monks.
+Juerd Waalboer <[EMAIL PROTECTED]>, with the help of many Perl Monks.
 
 =head1 SEE ALSO
 

==== //depot/maint-5.8/perl/pod/perlfaq1.pod#17 (text) ====
Index: perl/pod/perlfaq1.pod
--- perl/pod/perlfaq1.pod#16~32479~     2007-11-24 04:06:56.000000000 -0800
+++ perl/pod/perlfaq1.pod       2008-02-02 08:53:33.000000000 -0800
@@ -1,6 +1,6 @@
 =head1 NAME
 
-perlfaq1 - General Questions About Perl ($Revision: 10127 $)
+perlfaq1 - General Questions About Perl ($Revision: 10427 $)
 
 =head1 DESCRIPTION
 
@@ -61,7 +61,7 @@
 There is often a matter of opinion and taste, and there isn't any one
 answer that fits anyone.  In general, you want to use either the current
 stable release, or the stable release immediately prior to that one.
-Currently, those are perl5.8.x and perl5.6.x, respectively.
+Currently, those are perl5.10.x and perl5.8.x, respectively.
 
 Beyond that, you have to consider several things and decide which is best
 for you.
@@ -96,7 +96,7 @@
 
 =item *
 
-The immediate, previous releases (i.e. perl5.6.x ) are usually maintained
+The immediate, previous releases (i.e. perl5.8.x ) are usually maintained
 for a while, although not at the same level as the current releases.
 
 =item *
@@ -107,15 +107,15 @@
 
 =item *
 
-There is no Perl 6 for the next couple of years.  Stay tuned, but don't
-worry that you'll have to change major versions of Perl soon (i.e. before
-2008).
+There is no Perl 6 release scheduled, but it will be available when 
+it's ready.  Stay tuned, but don't worry that you'll have to change 
+major versions of Perl; no one is going to take Perl 5 away from you.
 
 =item *
 
 There are really two tracks of perl development: a maintenance version
 and an experimental version.  The maintenance versions are stable, and
-have an even number as the minor release (i.e. perl5.8.x, where 8 is the
+have an even number as the minor release (i.e. perl5.10.x, where 10 is the
 minor release).  The experimental versions may include features that
 don't make it into the stable versions, and have an odd number as the
 minor release (i.e. perl5.9.x, where 9 is the minor release).
@@ -400,9 +400,9 @@
 
 =head1 REVISION
 
-Revision: $Revision: 10127 $
+Revision: $Revision: 10427 $
 
-Date: $Date: 2007-10-27 21:40:20 +0200 (Sat, 27 Oct 2007) $
+Date: $Date: 2007-12-14 00:39:01 +0100 (Fri, 14 Dec 2007) $
 
 See L<perlfaq> for source control details and availability.
 

==== //depot/maint-5.8/perl/pod/perlfaq4.pod#29 (text) ====
Index: perl/pod/perlfaq4.pod
--- perl/pod/perlfaq4.pod#28~32479~     2007-11-24 04:06:56.000000000 -0800
+++ perl/pod/perlfaq4.pod       2008-02-02 08:53:33.000000000 -0800
@@ -1,6 +1,6 @@
 =head1 NAME
 
-perlfaq4 - Data Manipulation ($Revision: 10126 $)
+perlfaq4 - Data Manipulation ($Revision: 10394 $)
 
 =head1 DESCRIPTION
 
@@ -2071,10 +2071,16 @@
 
 =head2 How do I reset an each() operation part-way through?
 
-Using C<keys %hash> in scalar context returns the number of keys in
-the hash I<and> resets the iterator associated with the hash.  You may
-need to do this if you use C<last> to exit a loop early so that when
-you re-enter it, the hash iterator has been reset.
+(contributed by brian d foy)
+
+You can use the C<keys> or C<values> functions to reset C<each>. To
+simply reset the iterator used by C<each> without doing anything else,
+use one of them in void context:
+
+       keys %hash; # resets iterator, nothing else.
+       values %hash; # resets iterator, nothing else.
+
+See the documentation for C<each> in L<perlfunc>.
 
 =head2 How can I get the unique keys from two hashes?
 
@@ -2288,9 +2294,9 @@
 
 =head1 REVISION
 
-Revision: $Revision: 10126 $
+Revision: $Revision: 10394 $
 
-Date: $Date: 2007-10-27 21:29:20 +0200 (Sat, 27 Oct 2007) $
+Date: $Date: 2007-12-09 18:47:15 +0100 (Sun, 09 Dec 2007) $
 
 See L<perlfaq> for source control details and availability.
 

==== //depot/maint-5.8/perl/pod/perlfunc.pod#108 (text) ====
Index: perl/pod/perlfunc.pod
--- perl/pod/perlfunc.pod#107~32568~    2007-12-04 05:46:34.000000000 -0800
+++ perl/pod/perlfunc.pod       2008-02-02 08:53:33.000000000 -0800
@@ -332,10 +332,12 @@
 The interpretation of the file permission operators C<-r>, C<-R>,
 C<-w>, C<-W>, C<-x>, and C<-X> is by default based solely on the mode
 of the file and the uids and gids of the user.  There may be other
-reasons you can't actually read, write, or execute the file.  Such
-reasons may be for example network filesystem access controls, ACLs
-(access control lists), read-only filesystems, and unrecognized
-executable formats.
+reasons you can't actually read, write, or execute the file: for
+example network filesystem access controls, ACLs (access control lists),
+read-only filesystems, and unrecognized executable formats.  Note
+that the use of these six specific operators to verify if some operation
+is possible is usually a mistake, because it may be open to race
+conditions.
 
 Also note that, for the superuser on the local filesystems, the C<-r>,
 C<-R>, C<-w>, and C<-W> tests always return 1, and C<-x> and C<-X> return 1
@@ -350,8 +352,11 @@
 access() family of system calls.  Also note that the C<-x> and C<-X> may
 under this pragma return true even if there are no execute permission
 bits set (nor any extra execute permission ACLs).  This strangeness is
-due to the underlying system calls' definitions.  Read the
-documentation for the C<filetest> pragma for more information.
+due to the underlying system calls' definitions. Note also that, due to
+the implementation of C<use filetest 'access'>, the C<_> special
+filehandle won't cache the results of the file tests when this pragma is
+in effect.  Read the documentation for the C<filetest> pragma for more
+information.
 
 Note that C<-s/a/b/> does not do a negated substitution.  Saying
 C<-exp($foo)> still works as expected, however--only single letters
@@ -4220,10 +4225,10 @@
 Note the I<characters>: depending on the status of the socket, either
 (8-bit) bytes or characters are received.  By default all sockets
 operate on bytes, but for example if the socket has been changed using
-binmode() to operate with the C<:utf8> I/O layer (see the C<open>
-pragma, L<open>), the I/O will operate on UTF-8 encoded Unicode
-characters, not bytes.  Similarly for the C<:encoding> pragma:
-in that case pretty much any characters can be read.
+binmode() to operate with the C<:encoding(utf8)> I/O layer (see the
+C<open> pragma, L<open>), the I/O will operate on UTF-8 encoded Unicode
+characters, not bytes.  Similarly for the C<:encoding> pragma: in that
+case pretty much any characters can be read.
 
 =item redo LABEL
 X<redo>
@@ -4646,7 +4651,7 @@
 otherwise.
 
 Note the I<in bytes>: even if the filehandle has been set to
-operate on characters (for example by using the C<:utf8> open
+operate on characters (for example by using the C<:encoding(utf8)> open
 layer), tell() will return byte offsets, not character offsets
 (because implementing that would render seek() and tell() rather slow).
 
@@ -4836,10 +4841,10 @@
 Note the I<characters>: depending on the status of the socket, either
 (8-bit) bytes or characters are sent.  By default all sockets operate
 on bytes, but for example if the socket has been changed using
-binmode() to operate with the C<:utf8> I/O layer (see L</open>, or the
-C<open> pragma, L<open>), the I/O will operate on UTF-8 encoded
-Unicode characters, not bytes.  Similarly for the C<:encoding> pragma:
-in that case pretty much any characters can be sent.
+binmode() to operate with the C<:encoding(utf8)> I/O layer (see
+L</open>, or the C<open> pragma, L<open>), the I/O will operate on UTF-8
+encoded Unicode characters, not bytes.  Similarly for the C<:encoding>
+pragma: in that case pretty much any characters can be sent.
 
 =item setpgrp PID,PGRP
 X<setpgrp> X<group>
@@ -6156,9 +6161,9 @@
 negative).
 
 Note the I<in bytes>: even if the filehandle has been set to operate
-on characters (for example by using the C<:utf8> I/O layer), tell()
-will return byte offsets, not character offsets (because implementing
-that would render sysseek() very slow).
+on characters (for example by using the C<:encoding(utf8)> I/O layer),
+tell() will return byte offsets, not character offsets (because
+implementing that would render sysseek() very slow).
 
 sysseek() bypasses normal buffered IO, so mixing this with reads (other
 than C<sysread>, for example C<< <> >> or read()) C<print>, C<write>,
@@ -6283,9 +6288,9 @@
 last read.
 
 Note the I<in bytes>: even if the filehandle has been set to
-operate on characters (for example by using the C<:utf8> open
-layer), tell() will return byte offsets, not character offsets
-(because that would render seek() and tell() rather slow).
+operate on characters (for example by using the C<:encoding(utf8)> open
+layer), tell() will return byte offsets, not character offsets (because
+that would render seek() and tell() rather slow).
 
 The return value of tell() for the standard streams like the STDIN
 depends on the operating system: it may return -1 or something else.

==== //depot/maint-5.8/perl/pod/perlhack.pod#29 (text) ====
Index: perl/pod/perlhack.pod
--- perl/pod/perlhack.pod#28~32259~     2007-11-09 14:28:10.000000000 -0800
+++ perl/pod/perlhack.pod       2008-02-02 08:53:33.000000000 -0800
@@ -888,7 +888,7 @@
 
 =item Exception handing
 
-Perl's exception handing (i.e. C<die> etc) is built on top of the low-level
+Perl's exception handing (i.e. C<die> etc.) is built on top of the low-level
 C<setjmp()>/C<longjmp()> C-library functions. These basically provide a
 way to capture the current PC and SP registers and later restore them; i.e.
 a C<longjmp()> continues at the point in code where a previous C<setjmp()>
@@ -1556,7 +1556,7 @@
 =back
 
 The following flags would be nice to have but they would first need
-their own Stygian stablemaster:
+their own Augean stablemaster:
 
 =over 4
 
@@ -2362,6 +2362,29 @@
 
 =back
 
+=head3 Other environment variables that may influence tests
+
+=over 4
+
+=item PERL_TEST_Net_Ping
+
+Setting this variable runs all the Net::Ping modules tests,
+otherwise some tests that interact with the outside world are skipped.
+See L<perl58delta>.
+
+=item PERL_TEST_NOVREXX
+
+Setting this variable skips the vrexx.t tests for OS2::REXX.
+
+=item PERL_TEST_NUMCONVERTS
+
+This sets a variable in op/numconvert.t.
+
+=back
+
+See also the documentation for the Test and Test::Harness modules,
+for more environment variables that affect testing.
+
 =head2 Common problems when patching Perl source code
 
 Perl source plays by ANSI C89 rules: no C99 (or C++) extensions.  In

==== //depot/maint-5.8/perl/pod/perlhist.pod#40 (text) ====
Index: perl/pod/perlhist.pod
--- perl/pod/perlhist.pod#39~32568~     2007-12-04 05:46:34.000000000 -0800
+++ perl/pod/perlhist.pod       2008-02-02 08:53:33.000000000 -0800
@@ -395,6 +395,7 @@
           5.9.5         2007-Jul-07
           5.10.0-RC1    2007-Nov-17
           5.10.0-RC2    2007-Nov-25
+          5.10.0        2007-Dec-18
 
 =head2 SELECTED RELEASE SIZES
 

==== //depot/maint-5.8/perl/pod/perlopentut.pod#10 (text) ====
Index: perl/pod/perlopentut.pod
--- perl/pod/perlopentut.pod#9~32256~   2007-11-09 13:58:43.000000000 -0800
+++ perl/pod/perlopentut.pod    2008-02-02 08:53:33.000000000 -0800
@@ -917,7 +917,7 @@
 C<< '<' >>, C<< '>' >>, C<< '>>' >>, C<< '|' >> and their variants,
 for example:
 
-    open(my $fh, "<:utf8", $fn);
+    open(my $fh, "<:crlf", $fn);
 
 =item *
 

==== //depot/maint-5.8/perl/pod/perlrebackslash.pod#3 (text) ====
Index: perl/pod/perlrebackslash.pod
--- perl/pod/perlrebackslash.pod#2~32378~       2007-11-17 11:36:09.000000000 
-0800
+++ perl/pod/perlrebackslash.pod        2008-02-02 08:53:33.000000000 -0800
@@ -430,6 +430,11 @@
 include (but are not restricted to) I<combining characters> and
 I<vowel signs>.
 
+C<\X> matches quite well what normal (non-Unicode-programmer) usage
+would consider a single character: for example a base character
+(the C<\PM> above), for example a letter, followed by zero or more
+diacritics, which are I<combining characters> (the C<\pM*> above).
+
 Mnemonic: eI<X>tended Unicode character.
 
 =back

==== //depot/maint-5.8/perl/pod/perlrun.pod#66 (text) ====
Index: perl/pod/perlrun.pod
--- perl/pod/perlrun.pod#65~32259~      2007-11-09 14:28:10.000000000 -0800
+++ perl/pod/perlrun.pod        2008-02-02 08:53:33.000000000 -0800
@@ -1116,9 +1116,9 @@
 
 A pseudolayer that turns on a flag on the layer below to tell perl
 that output should be in utf8 and that input should be regarded as
-already in utf8 form.  May be useful in PERLIO environment
-variable to make UTF-8 the default. (To turn off that behaviour
-use C<:bytes> layer.)
+already in valid utf8 form. It does not check for validity and as such
+should be handled with caution for input. Generally C<:encoding(utf8)> is
+the best option when reading UTF-8 encoded data.
 
 =item :win32
 X<:win32>

==== //depot/maint-5.8/perl/pod/perlthrtut.pod#11 (text) ====
Index: perl/pod/perlthrtut.pod
--- perl/pod/perlthrtut.pod#10~32258~   2007-11-09 14:23:13.000000000 -0800
+++ perl/pod/perlthrtut.pod     2008-02-02 08:53:33.000000000 -0800
@@ -323,6 +323,36 @@
         # Do more work
     }
 
+=head2 Process and Thread Termination
+
+With threads one must be careful to make sure they all have a chance to
+run to completion, assuming that is what you want.
+
+An action that terminates a process will terminate I<all> running
+threads.  die() and exit() have this property,
+and perl does an exit when the main thread exits,
+perhaps implicitly by falling off the end of your code,
+even if that's not what you want.
+
+As an example of this case, this code prints the message
+"Perl exited with active threads: 2 running and unjoined":
+
+    use threads;
+    my $thr1 = threads->new(\&thrsub, "test1");
+    my $thr2 = threads->new(\&thrsub, "test2");
+    sub thrsub {
+       my ($message) = @_;
+       sleep 1;
+       print "thread $message\n";
+    }
+
+But when the following lines are added at the end:
+
+    $thr1->join;
+    $thr2->join;
+
+it prints two lines of output, a perhaps more useful outcome.
+
 =head1 Threads And Data
 
 Now that we've covered the basics of threads, it's time for our next

==== //depot/maint-5.8/perl/pod/perltodo.pod#39 (text) ====
Index: perl/pod/perltodo.pod
--- perl/pod/perltodo.pod#38~32568~     2007-12-04 05:46:34.000000000 -0800
+++ perl/pod/perltodo.pod       2008-02-02 08:53:33.000000000 -0800
@@ -468,13 +468,6 @@
 might be nice to do as Microsoft suggest here too, although, unlike the secure
 functions issue, there is presumably little or no benefit in this case.
 
-=head2 __FUNCTION__ for MSVC-pre-7.0
-
-Jarkko notes that one can things morally equivalent to C<__FUNCTION__>
-(or C<__func__>) even in MSVC-pre-7.0, contrary to popular belief.
-See L<http://www.codeproject.com/debug/extendedtrace.asp> if you feel like
-making C<PERL_MEM_LOG> more useful on Win32.
-
 =head2 strcat(), strcpy(), strncat(), strncpy(), sprintf(), vsprintf()
 
 Maybe create a utility that checks after each libperl.a creation that
@@ -486,6 +479,14 @@
 Note, of course, that this will only tell whether B<your> platform
 is using those naughty interfaces.
 
+=head2 -D_FORTIFY_SOURCE=2, -fstack-protector
+
+Recent glibcs support C<-D_FORTIFY_SOURCE=2> and recent gcc
+(4.1 onwards?) supports C<-fstack-protector>, both of which give
+protection against various kinds of buffer overflow problems.
+These should probably be used for compiling Perl whenever available,
+Configure and/or hints files should be adjusted to probe for the
+availability of these features and enable them as appropriate.
 
 =head1 Tasks that need a knowledge of XS
 
@@ -638,7 +639,7 @@
 =head2 Organize error messages
 
 Perl's diagnostics (error messages, see L<perldiag>) could use
-reorganizing so that each error message has its
+reorganizing and formalizing so that each error message has its
 stable-for-all-eternity unique id, categorized by severity, type, and
 subsystem.  (The error messages would be listed in a datafile outside
 of the Perl source code, and the source code would only refer to the
@@ -656,7 +657,7 @@
 This kind of functionality is known as I<message catalogs>.  Look for
 inspiration for example in the catgets() system, possibly even use it
 if available-- but B<only> if available, all platforms will B<not>
-catgets().
+have catgets().
 
 For the really pure at heart, consider extending this item to cover
 also the warning messages (see L<perllexwarn>, C<warnings.pl>).
@@ -666,18 +667,34 @@
 These tasks would need C knowledge, and knowledge of how the interpreter works,
 or a willingness to learn.
 
+=head2 UTF-8 revamp
+
+The handling of Unicode is unclean in many places. For example, the regexp
+engine matches in Unicode semantics whenever the string or the pattern is
+flagged as UTF-8, but that should not be dependent on an internal storage
+detail of the string. Likewise, case folding behaviour is dependent on the
+UTF8 internal flag being on or off.
+
+=head2 Properly Unicode safe tokeniser and pads.
+
+The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a hack -
+variable names are stored in stashes as raw bytes, without the utf-8 flag
+set. The pad API only takes a C<char *> pointer, so that's all bytes too. The
+tokeniser ignores the UTF-8-ness of C<PL_rsfp>, or any SVs returned from
+source filters.  All this could be fixed.
+
 =head2 state variable initialization in list context
 
 Currently this is illegal:
 
     state ($a, $b) = foo(); 
 
-The current Perl 6 design is that C<state ($a) = foo();> and
-C<(state $a) = foo();> have different semantics, which is tricky to implement
-in Perl 5 as currently the produce the same opcode trees. It would be useful
-to clarify that the Perl 6 design is firm, and then implement the necessary
-code in Perl 5. There are comments in C<Perl_newASSIGNOP()> that show the
-code paths taken by various assignment constructions involving state variables.
+In Perl 6, C<state ($a) = foo();> and C<(state $a) = foo();> have different
+semantics, which is tricky to implement in Perl 5 as currently they produce
+the same opcode trees. The Perl 6 design is firm, so it would be good to
+implement the necessary code in Perl 5. There are comments in
+C<Perl_newASSIGNOP()> that show the code paths taken by various assignment
+constructions involving state variables.
 
 =head2 Implement $value ~~ 0 .. $range
 
@@ -765,24 +782,16 @@
 perl and XS at run time, so investigate using 2 ops to enter subs (one for
 XS, one for perl) and swap between if a sub is redefined.
 
-=head2 Self ties
+=head2 Self-ties
 
-self ties are currently illegal because they caused too many segfaults. Maybe
-the causes of these could be tracked down and self-ties on all types re-
-instated.
+Self-ties are currently illegal because they caused too many segfaults. Maybe
+the causes of these could be tracked down and self-ties on all types
+reinstated.
 
 =head2 Optimize away @_
 
 The old perltodo notes "Look at the "reification" code in C<av.c>".
 
-=head2 Properly Unicode safe tokeniser and pads.
-
-The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a hack -
-variable names are stored in stashes as raw bytes, without the utf-8 flag
-set. The pad API only takes a C<char *> pointer, so that's all bytes too. The
-tokeniser ignores the UTF-8-ness of C<PL_rsfp>, or any SVs returned from
-source filters.  All this could be fixed.
-
 =head2 The yada yada yada operators
 
 Perl 6's Synopsis 3 says:

==== //depot/maint-5.8/perl/pod/perlunicode.pod#31 (text) ====
Index: perl/pod/perlunicode.pod
--- perl/pod/perlunicode.pod#30~32333~  2007-11-16 04:46:24.000000000 -0800
+++ perl/pod/perlunicode.pod    2008-02-02 08:53:33.000000000 -0800
@@ -1524,7 +1524,7 @@
 A filehandle that should read or write UTF-8
 
   if ($] > 5.007) {
-    binmode $fh, ":utf8";
+    binmode $fh, ":encoding(utf8)";
   }
 
 =item *

==== //depot/maint-5.8/perl/pod/perlunifaq.pod#2 (text) ====
Index: perl/pod/perlunifaq.pod
--- perl/pod/perlunifaq.pod#1~30764~    2007-03-26 10:20:04.000000000 -0700
+++ perl/pod/perlunifaq.pod     2008-02-02 08:53:33.000000000 -0800
@@ -2,7 +2,7 @@
 
 perlunifaq - Perl Unicode FAQ
 
-=head1 DESCRIPTION
+=head1 Q and A
 
 This is a list of questions and answers about Unicode in Perl, intended to be
 read after L<perlunitut>.
@@ -16,6 +16,21 @@
 think that Unicode is special and magical, and I didn't want to disappoint
 them, so I decided to call the document a Unicode tutorial.
 
+=head2 What character encodings does Perl support?
+
+To find out which character encodings your Perl supports, run:
+
+    perl -MEncode -le "print for Encode->encodings(':all')"
+
+=head2 Which version of perl should I use?
+
+Well, if you can, upgrade to the most recent, but certainly C<5.8.1> or newer.
+The tutorial and FAQ are based on the status quo as of C<5.8.8>.
+
+You should also check your modules, and upgrade them if necessary. For example,
+HTML::Entities requires version >= 1.32 to function correctly, even though the
+changelog is silent about this.
+
 =head2 What about binary data, like images?
 
 Well, apart from a bare C<binmode $fh>, you shouldn't treat them specially.
@@ -27,20 +42,9 @@
 appropriate encoding, then join them with binary strings. See also: "What if I
 don't encode?".
 
-=head2 What about the UTF8 flag?
-
-Please, unless you're hacking the internals, or debugging weirdness, don't
-think about the UTF8 flag at all. That means that you very probably shouldn't
-use C<is_utf8>, C<_utf8_on> or C<_utf8_off> at all.
-
-Perl's internal format happens to be UTF-8. Unfortunately, Perl can't keep a
-secret, so everyone knows about this.  That is the source of much confusion.
-It's better to pretend that the internal format is some unknown encoding,
-and that you always have to encode and decode explicitly.
-
 =head2 When should I decode or encode?
 
-Whenever you're communicating with anything that is external to your perl
+Whenever you're communicating text with anything that is external to your perl
 process, like a database, a text file, a socket, or another program. Even if
 the thing you're communicating with is also written in Perl.
 
@@ -88,23 +92,7 @@
     binmode $fh, ':encoding(UTF-8)';
 
 Some database drivers for DBI can also automatically encode and decode, but
-that is typically limited to the UTF-8 encoding, because they cheat.
-
-=head2 Cheat?! Tell me, how can I cheat?
-
-Well, because Perl's internal format is UTF-8, you can just skip the encoding
-or decoding step, and manipulate the UTF8 flag directly.
-
-Instead of C<:encoding(UTF-8)>, you can simply use C<:utf8>. This is widely
-accepted as good behavior when you're writing, but it can be dangerous when
-reading, because it causes internal inconsistency when you have invalid byte
-sequences.
-
-Instead of C<decode> and C<encode>, you could use C<_utf8_on> and C<_utf8_off>,
-but this is considered bad style. Especially C<_utf8_on> can be dangerous, for
-the same reason that C<:utf8> can.
-
-There are some shortcuts for oneliners; see C<-C> in L<perlrun>.
+that is sometimes limited to the UTF-8 encoding.
 
 =head2 What if I don't know which encoding was used?
 
@@ -146,6 +134,25 @@
 If you properly encode your strings for output, none of this is of your
 concern, and you can just C<eval> dumped data as always.
 
+=head2 Why do regex character classes sometimes match only in the ASCII range?
+
+=head2 Why do some characters not uppercase or lowercase correctly?
+
+It seemed like a good idea at the time, to keep the semantics the same for
+standard strings, when Perl got Unicode support. While it might be repaired
+in the future, we now have to deal with the fact that Perl treats equal
+strings differently, depending on the internal state.
+
+Affected are C<uc>, C<lc>, C<ucfirst>, C<lcfirst>, C<\U>, C<\L>, C<\u>, C<\l>,
+C<\d>, C<\s>, C<\w>, C<\D>, C<\S>, C<\W>, C</.../i>, C<(?i:...)>,
+C</[[:posix:]]/>.
+
+To force Unicode semantics, you can upgrade the internal representation to
+by doing C<utf8::upgrade($string)>. This does not change strings that were
+already upgraded.
+
+For a more detailed discussion, see L<Unicode::Semantics> on CPAN.
+
 =head2 How can I determine if a string is a text string or a binary string?
 
 You can't. Some use the UTF8 flag for this, but that's misuse, and makes well
@@ -176,6 +183,45 @@
     open my $barfh, '>:encoding(BAR)', 'example.bar.txt';
     print { $barfh } $_ while <$foofh>;
 
+=head2 What are C<decode_utf8> and C<encode_utf8>?
+
+These are alternate syntaxes for C<decode('utf8', ...)> and C<encode('utf8',
+...)>.
+
+=head2 What is a "wide character"?
+
+This is a term used both for characters with an ordinal value greater than 127,
+characters with an ordinal value greater than 255, or any character occupying
+than one byte, depending on the context.
+
+The Perl warning "Wide character in ..." is caused by a character with an
+ordinal value greater than 255. With no specified encoding layer, Perl tries to
+fit things in ISO-8859-1 for backward compatibility reasons. When it can't, it
+emits this warning (if warnings are enabled), and outputs UTF-8 encoded data
+instead.
+
+To avoid this warning and to avoid having different output encodings in a 
single
+stream, always specify an encoding explicitly, for example with a PerlIO layer:
+
+    binmode STDOUT, ":encoding(UTF-8)";
+
+=head1 INTERNALS
+
+=head2 What is "the UTF8 flag"?
+
+Please, unless you're hacking the internals, or debugging weirdness, don't
+think about the UTF8 flag at all. That means that you very probably shouldn't
+use C<is_utf8>, C<_utf8_on> or C<_utf8_off> at all.
+
+The UTF8 flag, also called SvUTF8, is an internal flag that indicates that the
+current internal representation is UTF-8. Without the flag, it is assumed to be
+ISO-8859-1. Perl converts between these automatically.
+
+One of Perl's internal formats happens to be UTF-8. Unfortunately, Perl can't
+keep a secret, so everyone knows about this. That is the source of much
+confusion. It's better to pretend that the internal format is some unknown
+encoding, and that you always have to encode and decode explicitly.
+
 =head2 What about the C<use bytes> pragma?
 
 Don't use it. It makes no sense to deal with bytes in a text string, and it
@@ -186,10 +232,36 @@
 C<use bytes> is usually a failed attempt to do something useful. Just forget
 about it.
 
-=head2 What are C<decode_utf8> and C<encode_utf8>?
+=head2 What about the C<use encoding> pragma?
 
-These are alternate syntaxes for C<decode('utf8', ...)> and C<encode('utf8',
-...)>.
+Don't use it. Unfortunately, it assumes that the programmer's environment and
+that of the user will use the same encoding. It will use the same encoding for
+the source code and for STDIN and STDOUT. When a program is copied to another
+machine, the source code does not change, but the STDIO environment might.
+
+If you need non-ASCII characters in your source code, make it a UTF-8 encoded
+file and C<use utf8>.
+
+If you need to set the encoding for STDIN, STDOUT, and STDERR, for example
+based on the user's locale, C<use open>.
+
+=head2 What is the difference between C<:encoding> and C<:utf8>?
+
+Because UTF-8 is one of Perl's internal formats, you can often just skip the
+encoding or decoding step, and manipulate the UTF8 flag directly.
+
+Instead of C<:encoding(UTF-8)>, you can simply use C<:utf8>, which skips the
+encoding step if the data was already represented as UTF8 internally. This is
+widely accepted as good behavior when you're writing, but it can be dangerous
+when reading, because it causes internal inconsistency when you have invalid
+byte sequences. Using C<:utf8> for input can sometimes result in security
+breaches, so please use C<:encoding(UTF-8)> instead.
+
+Instead of C<decode> and C<encode>, you could use C<_utf8_on> and C<_utf8_off>,
+but this is considered bad style. Especially C<_utf8_on> can be dangerous, for
+the same reason that C<:utf8> can.
+
+There are some shortcuts for oneliners; see C<-C> in L<perlrun>.
 
 =head2 What's the difference between C<UTF-8> and C<utf8>?
 
@@ -223,24 +295,9 @@
 encoding for a certain string is, but instead just encode it into the encoding
 that you want.
 
-=head2 What character encodings does Perl support?
-
-To find out which character encodings your Perl supports, run:
-
-    perl -MEncode -le "print for Encode->encodings(':all')"
-
-=head2 Which version of perl should I use?
-
-Well, if you can, upgrade to the most recent, but certainly C<5.8.1> or newer.
-The tutorial and FAQ are based on the status quo as of C<5.8.8>.
-
-You should also check your modules, and upgrade them if necessary. For example,
-HTML::Entities requires version >= 1.32 to function correctly, even though the
-changelog is silent about this.
-
 =head1 AUTHOR
 
-Juerd Waalboer <[EMAIL PROTECTED]>
+Juerd Waalboer <[EMAIL PROTECTED]>
 
 =head1 SEE ALSO
 

==== //depot/maint-5.8/perl/pod/perluniintro.pod#19 (text) ====
Index: perl/pod/perluniintro.pod
--- perl/pod/perluniintro.pod#18~32260~ 2007-11-09 14:47:44.000000000 -0800
+++ perl/pod/perluniintro.pod   2008-02-02 08:53:33.000000000 -0800
@@ -167,7 +167,7 @@
 
      Wide character in print at ...
 
-To output UTF-8, use the C<:utf8> output layer.  Prepending
+To output UTF-8, use the C<:encoding> or C<:utf8> output layer.  Prepending
 
       binmode(STDOUT, ":utf8");
 
@@ -317,7 +317,9 @@
 The matching of encoding names is loose: case does not matter, and
 many encodings have several aliases.  Note that the C<:utf8> layer
 must always be specified exactly like that; it is I<not> subject to
-the loose matching of encoding names.
+the loose matching of encoding names. Also note that C<:utf8> is unsafe for
+input, because it accepts the data without validating that it is indeed valid
+UTF8.
 
 See L<PerlIO> for the C<:utf8> layer, L<PerlIO::encoding> and
 L<Encode::PerlIO> for the C<:encoding()> layer, and
@@ -329,7 +331,7 @@
 Unicode in Perl's eyes.  To do that, specify the appropriate
 layer when opening files
 
-    open(my $fh,'<:utf8', 'anything');
+    open(my $fh,'<:encoding(utf8)', 'anything');
     my $line_of_unicode = <$fh>;
 
     open(my $fh,'<:encoding(Big5)', 'anything');
@@ -338,7 +340,7 @@
 The I/O layers can also be specified more flexibly with
 the C<open> pragma.  See L<open>, or look at the following example.
 
-    use open ':utf8'; # input and output default layer will be UTF-8
+    use open ':encoding(utf8)'; # input/output default encoding will be UTF-8
     open X, ">file";
     print X chr(0x100), "\n";
     close X;
@@ -358,11 +360,6 @@
     printf "%#x\n", ord(<I>), "\n"; # this should print 0xc1
     close I;
 
-or you can also use the C<':encoding(...)'> layer
-
-    open(my $epic,'<:encoding(iso-8859-7)','iliad.greek');
-    my $line_of_unicode = <$epic>;
-
 These methods install a transparent filter on the I/O stream that
 converts data from the specified encoding when it is read in from the
 stream.  The result is always Unicode.
@@ -411,13 +408,13 @@
     local $/; ## read in the whole file of 8-bit characters
     $t = <F>;
     close F;
-    open F, ">:utf8", "file";
+    open F, ">:encoding(utf8)", "file";
     print F $t; ## convert to UTF-8 on output
     close F;
 
 If you run this code twice, the contents of the F<file> will be twice
-UTF-8 encoded.  A C<use open ':utf8'> would have avoided the bug, or
-explicitly opening also the F<file> for input as UTF-8.
+UTF-8 encoded.  A C<use open ':encoding(utf8)'> would have avoided the
+bug, or explicitly opening also the F<file> for input as UTF-8.
 
 B<NOTE>: the C<:utf8> and C<:encoding> features work only if your
 Perl has been built with the new PerlIO feature (which is the default

==== //depot/maint-5.8/perl/pod/perlunitut.pod#3 (text) ====
Index: perl/pod/perlunitut.pod
--- perl/pod/perlunitut.pod#2~30764~    2007-03-26 10:20:04.000000000 -0700
+++ perl/pod/perlunitut.pod     2008-02-02 08:53:33.000000000 -0800
@@ -201,7 +201,7 @@
 
 =head1 AUTHOR
 
-Juerd Waalboer <[EMAIL PROTECTED]>
+Juerd Waalboer <[EMAIL PROTECTED]>
 
 =head1 SEE ALSO
 
End of Patch.

Change 33192: Integrate:

Reply via email to