derek Mon Apr 3 21:39:59 2006 UTC
Modified files:
/phpdoc/en/reference/mbstring reference.xml
Log:
Added a few grammatical fixes and provided a more in-depth explanation of why
we need mbstring because of the limitations of a byte.
http://cvs.php.net/viewcvs.cgi/phpdoc/en/reference/mbstring/reference.xml?r1=1.22&r2=1.23&diff_format=u
Index: phpdoc/en/reference/mbstring/reference.xml
diff -u phpdoc/en/reference/mbstring/reference.xml:1.22
phpdoc/en/reference/mbstring/reference.xml:1.23
--- phpdoc/en/reference/mbstring/reference.xml:1.22 Sun Sep 4 19:39:18 2005
+++ phpdoc/en/reference/mbstring/reference.xml Mon Apr 3 21:39:59 2006
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="iso-8859-1"?>
-<!-- $Revision: 1.22 $ -->
+<!-- $Revision: 1.23 $ -->
<!-- Purpose: international -->
<!-- Membership: bundled -->
@@ -12,12 +12,14 @@
&reftitle.intro;
<para>
While there are many languages in which every necessary character can
- be represented by a one-to-one mapping to a 8-bit value, there are also
+ be represented by a one-to-one mapping to an 8-bit value, there are also
several languages which require so many characters for written
- communication that cannot be contained within the range a mere byte can
- code. Multibyte character encoding schemes were developed to express
- that many (more than 256) characters in the regular bytewise coding
- system.
+ communication that they cannot be contained within the range a mere byte
+ can code (A byte is made up of eight bits. Each bit can contain only two
+ distinct values, one or zero. Because of this, a byte can only represent
+ 256 unique values (two to the power of eight)). Multibyte character
+ encoding schemes were developed to express more than 256 characters
+ in the regular bytewise coding system.
</para>
<para>
When you manipulate (trim, split, splice, etc.) strings encoded in a
@@ -29,17 +31,12 @@
most likely loses its original meaning.
</para>
<para>
- <literal>mbstring</literal> provides these multibyte specific
- string functions that help you deal with multibyte encodings in PHP,
- which is basically supposed to be used with single byte encodings.
- In addition to that, <literal>mbstring</literal> handles character
- encoding conversion between the possible encoding pairs.
- </para>
- <para>
- <literal>mbstring</literal> is also designed to handle Unicode-based
- encodings such as UTF-8 and UCS-2 and many single-byte encodings
- for convenience (listed below), whereas <literal>mbstring</literal> was
- originally developed for use in Japanese web pages.
+ <literal>mbstring</literal> provides multibyte specific string functions
+ that help you deal with multibyte encodings in PHP. In addition to that,
+ <literal>mbstring</literal> handles character encoding conversion between
+ the possible encoding pairs. <literal>mbstring</literal> is designed to
+ handle Unicode-based encodings such as UTF-8 and UCS-2 and many
+ single-byte encodings for convenience (listed below).
</para>
<section id="mbstring.php4.req">
@@ -115,14 +112,14 @@
</note>
<note>
<para>
- If you have some database connected with PHP, it is recommended that
- you use the same character encoding for both database and the
+ If you are connecting to a database with PHP, it is recommended that
+ you use the same character encoding for both the database and the
<literal>internal encoding</literal> for ease of use and better
performance.
</para>
<para>
If you are using PostgreSQL, the character encoding used in the
- database and the one used in the PHP may differ as it supports
+ database and the one used in PHP may differ as it supports
automatic character set conversion between the backend and the frontend.
</para>
</note>
@@ -175,7 +172,7 @@
</simpara>
<para>
There is no way to control HTTP input character
- conversion from PHP script. To disable HTTP input character
+ conversion from a PHP script. To disable HTTP input character
conversion, it has to be done in &php.ini;.
<example>
<title>
@@ -207,14 +204,14 @@
There are several ways to enable output character encoding
conversion. One is using &php.ini;, another
is using <function>ob_start</function> with
- <function>mb_output_handler</function> as
+ <function>mb_output_handler</function> as the
<literal>ob_start</literal> callback function.
</para>
<note>
<para>
PHP3-i18n users should note that <literal>mbstring</literal>'s output
conversion differs from PHP3-i18n. Character encoding is
- converted using output buffer.
+ converted using an output buffer.
</para>
</note>
</listitem>
@@ -268,7 +265,7 @@
<literal>mbstring</literal> functions.
</simpara>
<para>
- The following character encoding is supported in this PHP
+ The following character encodings are supported in this PHP
extension:
</para>
<itemizedlist>
@@ -330,11 +327,11 @@
<listitem><simpara>KOI8-R</simpara></listitem>
</itemizedlist>
<para>
- &php.ini; entry, which accepts encoding name,
- accepts "<literal>auto</literal>" and
- "<literal>pass</literal>" also.
- <literal>mbstring</literal> functions, which accepts encoding
- name, and accepts "<literal>auto</literal>".
+ Any &php.ini; entry which accepts an encoding name
+ can also use the values "<literal>auto</literal>" and
+ "<literal>pass</literal>".
+ <literal>mbstring</literal> functions which accept an encoding
+ name can also use the value "<literal>auto</literal>".
</para>
<para>
If "<literal>pass</literal>" is set, no character
@@ -358,13 +355,13 @@
</title>
<para>
You might often find it difficult to get an existing PHP application
- work in a given multibyte environment. That's mostly because lots of
- PHP applications out there are written with the standard
- string functions such as <function>substr</function>, which are
- known to not properly handle multibyte-encoded strings.
+ to work in a given multibyte environment. This happens because most
+ PHP applications out there are written with the standard string
+ functions such as <function>substr</function>, which are known to
+ not properly handle multibyte-encoded strings.
</para>
<para>
- mbstring supports 'function overloading' feature which enables
+ mbstring supports a 'function overloading' feature which enables
you to add multibyte awareness to such an application without
code modification by overloading multibyte counterparts on
the standard string functions. For example,
@@ -374,13 +371,13 @@
single-byte encodings to a multibyte environment in many cases.
</para>
<para>
- To use the function overloading, set
+ To use function overloading, set
<literal>mbstring.func_overload</literal> in &php.ini; to a
positive value that represents a combination of bitmasks specifying
the categories of functions to be overloaded. It should be set
to 1 to overload the <function>mail</function> function. 2 for string
functions, 4 for regular expression functions. For example,
- if is set for 7, mail, strings and regular expression functions should
+ if it is set to 7, mail, strings and regular expression functions will
be overloaded. The list of overloaded functions are shown below.
<table>
<title>Functions to be overloaded</title>
@@ -475,18 +472,13 @@
<section id="mbstring.ja-basic">
<title>Basics of Japanese multi-byte encodings</title>
<para>
- It is often said quite hard to figure out how Japanese texts are
- handled in the computer. This is not only because Japanese characters
- can only be represented by multibyte encodings, but because different
- encoding standards are adopted for different purposes / platforms.
- Moreover, not a few character set standards are used there, which
- are slightly different from one another. Those facts have often led
- developers to inevitable mess-up.
- </para>
- <para>
- To create a working web application that would be put in the Japanese
- environment, it is important to use the proper character encoding and
- character set for the task in hand.
+ Japanese characters can only be represented by multibyte encodings,
+ and multiple encoding standards are used depending on platform and
+ text purpose. To make matters worse, these encoding standards
+ differ slightly from one another. In order to create a web
+ application which would be usable in a Japanese environment, a
+ developer has to keep these complexities in mind to ensure that the
+ proper character encodings are used.
</para>
<para>
<itemizedlist>
@@ -495,18 +487,19 @@
</listitem>
<listitem>
<simpara>
- Most of multibyte characters often appear twice as wide as
- a single-byte character on display. Those characters are called
- "zen-kaku" in Japanese which means "full width", and the other
- (narrower) characters are called "han-kaku" - means half width.
- However the graphical properties of the characters depend on
- the glyphs of the type faces used to display them or print them out.
+ Most Japanese multibyte characters appear twice as wide as
+ single-byte characters. These characters are called "
+ zen-kaku" in Japanese, which means "full width".
+ Other, narrower, characters are called "han-kaku",
+ which means "half width". The graphical properties
+ of the characters, however, depends upon the type faces used
+ to display them.
</simpara>
</listitem>
<listitem>
<simpara>
Some character encodings use shift(escape) sequences defined
- in ISO2022 to switch the code map of the specific code area
+ in ISO-2022 to switch the code map of the specific code area
(<literal>00h</literal> to <literal>7fh</literal>).
</simpara>
</listitem>
@@ -533,10 +526,10 @@
<section id="mbstring.ref">
<title>References</title>
<para>
- Multibyte character encoding schemes and the related issues are very
- complicated. There should be too few space to cover in sufficient details.
- Please refer to the following URLs and other resources for
- further readings.
+ Multibyte character encoding schemes and their related issues are
+ fairly complicated, and are beyond the scope of this documentation.
+ Please refer to the following URLs and other resources for further
+ information regarding these topics.
<itemizedlist>
<listitem>
<para>