Change 20939: Integrate:

Jarkko Hietaniemi Fri, 29 Aug 2003 21:55:55 +0000

Change 20939 by [EMAIL PROTECTED] on 2003/08/29 20:28:49

        Integrate:
        [ 20935]
        Tiny doc tweak from Shannon -jj Behrens.
        
        [ 20936]
        Some perluniintro tweaks.
        
        [ 20937]
        Subject: [PATCH] Re: all 9007199254740992s are equal, but some are more equal 
than others
        From: Nicholas Clark <[EMAIL PROTECTED]>
        Date: Wed, 27 Aug 2003 22:59:55 +0100
        Message-ID: <[EMAIL PROTECTED]>


Affected files ...

... //depot/maint-5.8/perl/pod/perluniintro.pod#10 integrate
... //depot/maint-5.8/perl/pp.c#28 integrate

Differences ...

==== //depot/maint-5.8/perl/pod/perluniintro.pod#10 (text) ====
Index: perl/pod/perluniintro.pod
--- perl/pod/perluniintro.pod#9~20818~  Thu Aug 21 22:54:24 2003
+++ perl/pod/perluniintro.pod   Fri Aug 29 13:28:49 2003
@@ -19,6 +19,7 @@
 in the largest Chinese, Japanese, and Korean dictionaries are also
 encoded. The standards will eventually cover almost all characters in
 more than 250 writing systems and thousands of languages.
+Unicode 1.0 was released in October 1991, and 4.0 in April 2003.
 
 A Unicode I<character> is an abstract entity.  It is not bound to any
 particular integer width, especially not to the C language C<char>.
@@ -33,11 +34,10 @@
 I<code points>.
 
 The Unicode standard prefers using hexadecimal notation for the code
-points.  If numbers like C<0x0041> are unfamiliar to
-you, take a peek at a later section, L</"Hexadecimal Notation">.
-The Unicode standard uses the notation C<U+0041 LATIN CAPITAL LETTER A>,
-to give the hexadecimal code point and the normative name of
-the character.
+points.  If numbers like C<0x0041> are unfamiliar to you, take a peek
+at a later section, L</"Hexadecimal Notation">.  The Unicode standard
+uses the notation C<U+0041 LATIN CAPITAL LETTER A>, to give the
+hexadecimal code point and the normative name of the character.
 
 Unicode also defines various I<properties> for the characters, like
 "uppercase" or "lowercase", "decimal digit", or "punctuation";
@@ -86,12 +86,13 @@
 
 A common myth about Unicode is that it would be "16-bit", that is,
 Unicode is only represented as C<0x10000> (or 65536) characters from
-C<0x0000> to C<0xFFFF>.  B<This is untrue.> Since Unicode 2.0, Unicode
-has been defined all the way up to 21 bits (C<0x10FFFF>), and since
-Unicode 3.1, characters have been defined beyond C<0xFFFF>.  The first
-C<0x10000> characters are called the I<Plane 0>, or the I<Basic
-Multilingual Plane> (BMP).  With Unicode 3.1, 17 planes in all are
-defined--but nowhere near full of defined characters, yet.
+C<0x0000> to C<0xFFFF>.  B<This is untrue.>  Since Unicode 2.0 (July
+1996), Unicode has been defined all the way up to 21 bits (C<0x10FFFF>),
+and since Unicode 3.1 (March 2001), characters have been defined
+beyond C<0xFFFF>.  The first C<0x10000> characters are called the
+I<Plane 0>, or the I<Basic Multilingual Plane> (BMP).  With Unicode
+3.1, 17 (yes, seventeen) planes in all were defined--but they are
+nowhere near full of defined characters, yet.
 
 Another myth is that the 256-character blocks have something to
 do with languages--that each block would define the characters used
@@ -104,13 +105,14 @@
 For further information see L<Unicode::UCD>.
 
 The Unicode code points are just abstract numbers.  To input and
-output these abstract numbers, the numbers must be I<encoded> somehow.
-Unicode defines several I<character encoding forms>, of which I<UTF-8>
-is perhaps the most popular.  UTF-8 is a variable length encoding that
-encodes Unicode characters as 1 to 6 bytes (only 4 with the currently
-defined characters).  Other encodings include UTF-16 and UTF-32 and their
-big- and little-endian variants (UTF-8 is byte-order independent)
-The ISO/IEC 10646 defines the UCS-2 and UCS-4 encoding forms.
+output these abstract numbers, the numbers must be I<encoded> or
+I<serialised> somehow.  Unicode defines several I<character encoding
+forms>, of which I<UTF-8> is perhaps the most popular.  UTF-8 is a
+variable length encoding that encodes Unicode characters as 1 to 6
+bytes (only 4 with the currently defined characters).  Other encodings
+include UTF-16 and UTF-32 and their big- and little-endian variants
+(UTF-8 is byte-order independent) The ISO/IEC 10646 defines the UCS-2
+and UCS-4 encoding forms.
 
 For more information about encodings--for instance, to learn what
 I<surrogates> and I<byte order marks> (BOMs) are--see L<perlunicode>.
@@ -645,8 +647,8 @@
     $b = "\x{100}";
     print "$a = $b\n";
 
-the output string will be UTF-8-encoded C<ab\x80c\x{100}\n>, but note
-that C<$a> will stay byte-encoded.
+the output string will be UTF-8-encoded C<ab\x80c = \x{100}\n>, but
+C<$a> will stay byte-encoded.
 
 Sometimes you might really need to know the byte length of a string
 instead of the character length. For that use either the
@@ -752,7 +754,10 @@
 How Does Unicode Work With Traditional Locales?
 
 In Perl, not very well.  Avoid using locales through the C<locale>
-pragma.  Use only one or the other.
+pragma.  Use only one or the other.  But see L<perlrun> for the
+description of the C<-C> switch and its environment counterpart,
+C<$ENV{PERL_UNICODE}> to see how to enable various Unicode features,
+for example by using locale settings.
 
 =back
 
@@ -876,7 +881,8 @@
 =head1 SEE ALSO
 
 L<perlunicode>, L<Encode>, L<encoding>, L<open>, L<utf8>, L<bytes>,
-L<perlretut>, L<Unicode::Collate>, L<Unicode::Normalize>, L<Unicode::UCD>
+L<perlretut>, L<perlrun>, L<Unicode::Collate>, L<Unicode::Normalize>,
+L<Unicode::UCD>
 
 =head1 ACKNOWLEDGMENTS
 

==== //depot/maint-5.8/perl/pp.c#28 (text) ====
Index: perl/pp.c
--- perl/pp.c#27~20468~ Sun Aug  3 22:16:18 2003
+++ perl/pp.c   Fri Aug 29 13:28:49 2003
@@ -2766,28 +2766,6 @@
     }
 }
 
-/*
- * There are strange code-generation bugs caused on sparc64 by gcc-2.95.2.
- * These need to be revisited when a newer toolchain becomes available.
- */
-#if defined(__sparc64__) && defined(__GNUC__)
-#   if __GNUC__ < 2 || (__GNUC__ == 2 && __GNUC_MINOR__ < 96)
-#       undef  SPARC64_MODF_WORKAROUND
-#       define SPARC64_MODF_WORKAROUND 1
-#   endif
-#endif
-
-#if defined(SPARC64_MODF_WORKAROUND)
-static NV
-sparc64_workaround_modf(NV theVal, NV *theIntRes)
-{
-    NV res, ret;
-    ret = Perl_modf(theVal, &res);
-    *theIntRes = res;
-    return ret;
-}
-#endif
-
 PP(pp_int)
 {
     dSP; dTARGET; tryAMAGICun(int);
@@ -2811,34 +2789,15 @@
              if (value < (NV)UV_MAX + 0.5) {
                  SETu(U_V(value));
              } else {
-#if defined(SPARC64_MODF_WORKAROUND)
-                  (void)sparc64_workaround_modf(value, &value);
-#elif defined(HAS_MODFL_POW32_BUG)
-/* some versions of glibc split (i + d) into (i-1, d+1) for 2^32 <= i < 2^64 */
-                  NV offset = Perl_modf(value, &value);
-                  (void)Perl_modf(offset, &offset);
-                  value += offset;
-#else
-                  (void)Perl_modf(value, &value);
-#endif
-                 SETn(value);
+                 SETn(Perl_floor(value));
              }
          }
          else {
              if (value > (NV)IV_MIN - 0.5) {
                  SETi(I_V(value));
              } else {
-#if defined(SPARC64_MODF_WORKAROUND)
-                  (void)sparc64_workaround_modf(-value, &value);
-#elif defined(HAS_MODFL_POW32_BUG)
-/* some versions of glibc split (i + d) into (i-1, d+1) for 2^32 <= i < 2^64 */
-                  NV offset = Perl_modf(-value, &value);
-                  (void)Perl_modf(offset, &offset);
-                  value += offset;
-#else
-                 (void)Perl_modf(-value, &value);
-#endif
-                 SETn(-value);
+               /* This is maint, and we don't have Perl_ceil in perl.h  */
+                 SETn(-Perl_floor(-value));
              }
          }
       }
End of Patch.

Change 20939: Integrate:

Reply via email to