hirokawa Thu Jun 28 23:20:29 2001 EDT Modified files: /phpdoc/en/functions mbstring.xml Log: fixed some typos.
Index: phpdoc/en/functions/mbstring.xml diff -u phpdoc/en/functions/mbstring.xml:1.2 phpdoc/en/functions/mbstring.xml:1.3 --- phpdoc/en/functions/mbstring.xml:1.2 Sun Jun 24 11:27:21 2001 +++ phpdoc/en/functions/mbstring.xml Thu Jun 28 23:20:28 2001 @@ -1,117 +1,305 @@ <reference id="ref.mbstring"> <title>Multi-Byte String Functions</title> - <titleabbrev>Multi-Byte String</titleabbrev> + <titleabbrev> + Multi-Byte String + </titleabbrev> <partintro> &warn.experimental; <sect1 id="mb-intro"> <title>Introduction</title> <warning> <simpara> - This module is EXPERIMENTAL. Function name/API is subject to be - changed. Current conversion filter supports Japanese only. + This module is EXPERIMENTAL. Function name/API is subject to + change. Current conversion filter supports Japanese only. </simpara> </warning> <para> - There are many languages that all characters cannot be expressed + There are many languages in which all characters can be expressed by single byte. Multi-byte character codes are used to express many characters for many languages. <literal>mbstring</literal> is developed to handle Japanese characters. However, many <literal>mbstring</literal> functions are able to handle - character codes other than Japanese. + character encoding other than Japanese. </para> <para> - Multi-byte character encoding represents single character with + A multi-byte character encoding represents single character with consecutive bytes. Some character encoding has shift(escape) - sequences to start/end multi-byte character string. Therefore, + sequences to start/end multi-byte character strings. Therefore, a multi-byte character string may be destroyed when it is divided - and/or counted, unless multi-byte character encoding safe method - is used. <literal>mbstring</literal> functions support multi-byte - character safe string functions and other utility functions such - as conversion functions. + and/or counted unless multi-byte character encoding safe method + is used. This module provides multi-byte character safe string + functions and other utility functions such as conversion + functions. </para> + <para> + Since PHP is basically designed for ISO-8859-1, some multi-byte + character encoding does not work well with PHP. Therefore, it is + important to set <literal>mbstring.internal_encoding</literal> to + a character encoding that works with PHP. + </para> + <para> + PHP4 Character Encoding Requirements + </para> + <para> + <itemizedlist> + <listitem> + <simpara> + Per byte encoding + </simpara> + </listitem> + <listitem> + <simpara> + Single byte characters in range of <literal>00h-7fh</literal> + which is compatible with <literal>ASCII</literal> + </simpara> + </listitem> + <listitem> + <simpara> + Multi-byte characters without <literal>00h-7fh</literal> + </simpara> + </listitem> + </itemizedlist> + </para> + <para> + These are examples of internal character encoding that works with + PHP and does NOT work with PHP. + <informalexample> + <programlisting> - <sect2 id="mb-ja-basic"> - <title>Basics for Japanese multi-byte character</title> +Character encodings work with PHP: +ISO-8859-*, EUC-JP, UTF-8 + + +Character encodings do NOT work with PHP: +JIS, SJIS + </programlisting> + </informalexample> + </para> + <para> + Character encoding, that does not work with PHP, may be converted + with <literal>mbstring</literal>'s HTTP input/output conversion + feature/function. + </para> + <note> + <para> + SJIS should not be used for internal encoding unless the reader + is familiar with parser/compiler, character encoding and + character encoding issues. + </para> + </note> + <note> <para> - Most Japanese characters need more than 1 byte for a - character. In addition to this, several character encodings are - used under Japanese environment. There are EUC-JP, Shift_JIS and - ISO-2022-JP character encoding. As Unicode is getting popular, - UTF-8 is used also. To develop Web application for Japanese - environment, it is important to use these character codes depend - on its purpose, HTTP input/output, RDBMS and E-mail. + If you use database with PHP, it is recommended that you use the + same character encoding for both database and <literal>internal + encoding</literal> for ease of use and better performance. + </para> + <para> + If you are using PostgreSQL, it supports character + encoding that is different from backend character encoding. See + the PostgreSQL manual for details. </para> + </note> + + <sect2 id="mb-enable"> + <title>How to Enable mbstring</title> <para> + <literal>mbstring</literal> is an extended module. You must + enable module with <literal>configure</literal> script. Refer + to the <link linkend="installation">Install</link> section for + details. + </para> + <simpara> + The following configure options are related to + <literal>mbstring</literal> module. + </simpara> + <para> <itemizedlist> - <listitem> - <simpara> - Storage for a character can be upto four bytes - </simpara> - </listitem> <listitem> - <simpara> - A multi-byte character usually has twice of width compare to - single byte characters. Wider character is called "zen-kaku" - - meaning full width, narrower character called "han-kaku" - - meaning half width. "zen-kaku" characters are fixed width - usually. - </simpara> + <para> + <option role="configure">--enable-mbstring</option> : Enable + <literal>mbstring</literal> functions. This option is + required to use <literal>mbstring</literal> functions. + </para> </listitem> <listitem> - <simpara> - Some character encoding defines shift sequence for - entering/exiting multi-byte character strings. - </simpara> + <para> + <option role="configure">--enable-mbstr-enc-trans</option> : + Enable HTTP input character encoding conversion using + <literal>mbstring</literal> conversion engine. If this + feature is enabled, HTTP input character encoding may be + converted to <literal>mbstring.internal_encoding</literal> + automatically. + </para> </listitem> + </itemizedlist> + </para> + </sect2> + + <sect2 id="mb-conv"> + <title>HTTP Input and Output</title> + <para> + HTTP input/output character encoding conversion may convert + binary data also. Users are supposed to control character + encoding conversion if binary data is used for HTTP + input/output. + </para> + <para> + If <literal>enctype</literal> for HTML form is set to + <literal>multipart/form-data</literal>, + <literal>mbstring</literal> does not convert character encoding + in POST data. If it is the case, strings are needed to be + converted to internal character encoding. + </para> + <para> + <itemizedlist> <listitem> <simpara> - Database may allocate storage for characters that differs - from size used in PHP even if the same character encoding is - used. (For example, PostgreSQL) + HTTP Input </simpara> + <para> There is no way to control HTTP input character + conversion from PHP script. To disable HTTP input character + conversion, it has to be done in <literal>php.ini</literal>. + <example> + <title> + Disable HTTP input conversion in php.ini + </title> + <programlisting role="php"> + +;; Disable HTTP Input conversion +mbstring.http_input = pass + </programlisting> + </example> + </para> + <para> + When using PHP as an Apache module, it is possible to + override PHP ini setting per Virtual Host in + <literal>httpd.conf</literal> or per directory with + <literal>.htaccess</literal>. Refer to the <link + linkend="configuration">Configuration</link> section and + Apache Manual for details. + </para> </listitem> <listitem> <simpara> - E-mail is supposed to use ISO-2022-JP. + HTTP Output </simpara> - </listitem> - <listitem> <para> - "i-mode" web site is supposed to use Shift_JIS. + There are several ways to enable output character encoding + conversion. One is using <literal>php.ini</literal>, another + is using <function>ob_start</function> with + <function>mb_output_handler</function> as + <literal>ob_start</literal> callback function. </para> + <note> + <para> + For PHP3-i18n users, <literal>mbstring</literal>'s output + conversion differs from PHP3-i18n. Character encoding is + converted using output buffer. + </para> + </note> </listitem> </itemizedlist> </para> + <para> + <example> + <title><literal>php.ini</literal> setting example</title> + <programlisting role="php"> + +;; Enable output character encoding conversion for all PHP pages + +;; Enable Output Buffering +output_buffering = On + +;; Set mb_output_handler to enable output conversion +output_handler = mb_output_handler + </programlisting> + </example> + </para> + <para> + <example> + <title>Script example</title> + <programlisting role="php"> + +<?php + +// Enable output character encoding conversion only for this page + +// Set HTTP output character encoding to SJIS +mb_http_output('SJIS'); + +// Start buffering and specify "mb_output_handler" as +// callback function +ob_start('mb_output_handler'); + +?> + </programlisting> + </example> + </para> </sect2> <sect2 id="mb-code"> - <title>Supported character encodings</title> + <title>Supported Character Encoding</title> + <simpara> + Currently, the following character encoding is supported by + <literal>mbstring</literal> module. Caracter encoding may + be specified for <literal>mbstring</literal> functions' + <literal>encoding</literal> parameter. </simpara> + <para> + The following character encoding is supported in this PHP + extension : + </para> <para> - Following character encodings are supported in this PHP - extension : <literal>UCS-4</literal>, - <literal>UCS-4BE</literal>, <literal>UCS-4LE</literal>, - <literal>UCS-2</literal>, <literal>UCS-2BE</literal>, - <literal>UCS-2LE</literal>, <literal>UTF-32</literal>, - <literal>UTF-32BE</literal>, <literal>UTF-32LE</literal>, - <literal>UCS-2LE</literal>, <literal>UTF-16</literal>, - <literal>UTF-16BE</literal>, <literal>UTF-16LE</literal>, - <literal>UTF-8</literal>, <literal>UTF-7</literal>, - <literal>ASCII</literal>, <literal>EUC-JP</literal>, - <literal>SJIS</literal>, <literal>eucJP-win</literal>, - <literal>SJIS-win</literal>, - <literal>ISO-2022-JP</literal>(<literal>JIS</literal>), + <literal>UCS-4</literal>, <literal>UCS-4BE</literal>, + <literal>UCS-4LE</literal>, <literal>UCS-2</literal>, + <literal>UCS-2BE</literal>, <literal>UCS-2LE</literal>, + <literal>UTF-32</literal>, <literal>UTF-32BE</literal>, + <literal>UTF-32LE</literal>, <literal>UCS-2LE</literal>, + <literal>UTF-16</literal>, <literal>UTF-16BE</literal>, + <literal>UTF-16LE</literal>, <literal>UTF-8</literal>, + <literal>UTF-7</literal>, <literal>ASCII</literal>, + <literal>EUC-JP</literal>, <literal>SJIS</literal>, + <literal>eucJP-win</literal>, <literal>SJIS-win</literal>, + <literal>ISO-2022-JP</literal>, <literal>JIS</literal>, <literal>ISO-8859-1</literal>, <literal>ISO-8859-2</literal>, <literal>ISO-8859-3</literal>, <literal>ISO-8859-4</literal>, <literal>ISO-8859-5</literal>, <literal>ISO-8859-6</literal>, <literal>ISO-8859-7</literal>, <literal>ISO-8859-8</literal>, <literal>ISO-8859-9</literal>, <literal>ISO-8859-10</literal>, <literal>ISO-8859-13</literal>, <literal>ISO-8859-14</literal>, - <literal>ISO-8859-15</literal>. + <literal>ISO-8859-15</literal>, <literal>byte2be</literal>, + <literal>byte2le</literal>, <literal>byte4be</literal>, + <literal>byte4le</literal>, <literal>BASE64</literal>, + <literal>7bit</literal>, <literal>8bit</literal> and + <literal>UTF7-IMAP</literal>. + </para> + <para> + <literal>php.ini</literal> entry, which accepts encoding name, + accepts "<literal>auto</literal>" and + "<literal>pass</literal>" also. + <literal>mbstring</literal> functions, which accepts encoding + name, and accepts "<literal>auto</literal>". + </para> + <para> + If "<literal>pass</literal>" is set, no character + encoding conversion is performed. + </para> + <para> + If "<literal>auto</literal>" is set, it is expanded to + "<literal>ASCII,JIS,UTF-8,EUC-JP,SJIS</literal>". + </para> + <para> + See also <function>mb_detect_order</function> </para> + <note> + <para> + "Supported character encoding" does not mean that it + works as internal character code. + </para> + </note> </sect2> <sect2 id="mb-ini"> - <title> php.ini settings </title> + <title>php.ini settings</title> <para> <itemizedlist> <listitem> @@ -122,63 +310,311 @@ </listitem> <listitem> <simpara> - <literal>mbstring.http_input</literal> defines default HTTP input - character encoding. + <literal>mbstring.http_input</literal> defines default HTTP + input character encoding. </simpara> </listitem> <listitem> <simpara> - <literal>mbstring.http_output</literal> defines default HTTP output - character encoding. + <literal>mbstring.http_output</literal> defines default HTTP + output character encoding. </simpara> </listitem> <listitem> <simpara> - <literal>mbstring.detect_order</literal> defines default character - encoding detection order. + <literal>mbstring.detect_order</literal> defines default + character code detection order. See also + <function>mb_detect_order</function>. </simpara> </listitem> <listitem> <simpara> - <literal>mbstring.substitute_character</literal> defines character - to substitute for invalid character codes. + <literal>mbstring.substitute_character</literal> defines + character to substitute for invalid character encoding. </simpara> </listitem> </itemizedlist> </para> <para> + Web Browsers are supposed to use the same character encoding + when submitting form. However, browsers may not use the same + character encoding. See <function>mb_http_input</function> to + detect character encoding used by browsers. + </para> + <para> + If <literal>enctype</literal> is set to + <literal>multipart/form-data</literal> in HTML forms, + <literal>mbstring</literal> does not convert character encoding + in POST data. The user must convert them in the script, if + conversion is needed. + </para> + <para> + Although, browsers are smart enough to detect character encoding + in HTML. <literal>charset</literal> is better to be set in HTTP + header. Change <literal>default_charset</literal> according to + character encoding. + </para> + <para> <example> <title><literal>php.ini</literal> setting example</title> - <programlisting role="php.ini"> + <programlisting role="php"> + ;; Set default internal encoding +;; Note: Make sure to use character encoding works with PHP mbstring.internal_encoding = UTF-8 ; Set internal encoding to UTF-8 -;; Set default HTTP input character code -mbstring.http_input = auto ; Set HTTP input to auto -; or -; mbstring.http_input = SJIS ; Set HTTP input to SJIS -; mbstring.http_input = eucjp-win, sjis-win, UTF-8 ; Specify order - -;; Set default HTTP output character code -mbstring.http_output = UTF-8 ; Set HTTP output encoding to UTF-8 - -;; Set default character code detection order -mbstring.detect_order = auto ; Set HTTP output to auto -; or -; mbstring.detect_order = eucjp-win, sjis-win, UTF-8 ; Specify order +;; Set default HTTP input character encoding +;; Note: Script cannot change http_input setting. +mbstring.http_input = pass ; No conversion. +mbstring.http_input = auto ; Set HTTP input to auto + ; "auto" is expanded to +"ASCII,JIS,UTF-8,EUC-JP,SJIS" +mbstring.http_input = SJIS ; Set HTTP2 input to SJIS +mbstring.http_input = UTF-8,SJIS,EUC-JP ; Specify order + +;; Set default HTTP output character encoding +mbstring.http_output = pass ; No conversion +mbstring.http_output = UTF-8 ; Set HTTP output encoding to UTF-8 + +;; Set default character encoding detection order +mbstring.detect_order = auto ; Set detect order to auto +mbstring.detect_order = ASCII,JIS,UTF-8,SJIS,EUC-JP ; Specify order ;; Set default substitute character -mbstring.substitute_character = 12307 ; Specify character code -; or -; mbstring.substitute_character = none ; Null character -; mbstring.substitute_character = long ; Long +mbstring.substitute_character = 12307 ; Specify Unicode value +mbstring.substitute_character = none ; Do not print character +mbstring.substitute_character = long ; Long Example: U+3000,JIS+7E7E </programlisting> </example> </para> + <para> + <example> + <title><literal>php.ini</literal> setting for <literal>EUC-JP</literal> +users</title> + <programlisting role="php"> + +;; Disable Output Buffering +output_buffering = Off + +;; Set HTTP header charset +default_charset = EUC-JP + +;; Set HTTP input encoding conversion to auto +mbstring.http_input = auto + +;; Convert HTTP output to EUC-JP +mbstring.http_output = EUC-JP + +;; Set internal encoding to EUC-JP +mbstring.internal_encoding = EUC-JP + +;; Do not print invalid characters +mbstring.substitute_character = none + </programlisting> + </example> + </para> + <para> + <example> + <title><literal>php.ini</literal> setting for <literal>SJIS</literal> +users</title> + <programlisting role="php"> + +;; Enable Output Buffering +output_buffering = On + +;; Set mb_output_handler to enable output conversion +output_handler = mb_output_handler + +;; Set HTTP header charset +default_charset = Shift_JIS + +;; Set http input encoding conversion to auto +mbstring.http_input = auto + +;; Convert to SJIS +mbstring.http_output = SJIS + +;; Set internal encoding to EUC-JP +mbstring.internal_encoding = EUC-JP + +;; Do not print invalid characters +mbstring.substitute_character = none + </programlisting> + </example> + </para> </sect2> + + <sect2 id="mb-ja-basic"> + <title>Basics for Japanese multi-byte character</title> + <para> + Most Japanese characters need more than 1 byte per character. In + addition, several character encoding schemas are used under a + Japanese environment. There are EUC-JP, Shift_JIS(SJIS) and + ISO-2022-JP(JIS) character encoding. As Unicode becomes popular, + UTF-8 is used also. To develop Web applications for a Japanese + environment, it is important to use the character set for the + task in hand, whether HTTP input/output, RDBMS and E-mail. + </para> + <para> + <itemizedlist> + <listitem> + <simpara>Storage for a character can be up to four + bytes</simpara> + </listitem> + <listitem> + <simpara> + A multi-byte character is usually twice of the width compared + to single-byte characters. Wider characters are called + "zen-kaku" - meaning full width, narrower characters are + called "han-kaku" - meaning half width. "zen-kaku" characters + are usually fixed width. + </simpara> + </listitem> + <listitem> + <simpara> + Some character encoding defines shift(escape) sequence for + entering/exiting multi-byte character strings. + </simpara> + </listitem> + <listitem> + <simpara> + ISO-2022-JP must be used for SMTP/NNTP. + </simpara> + </listitem> + <listitem> + <para> + "i-mode" web site is supposed to use SJIS. + </para> + </listitem> + </itemizedlist> + </para> + </sect2> + + <sect2 id="mb-ref"> + <title>References</title> + <para> + Multi-byte character encoding and its related issues are very + complex. It is impossible to cover in sufficient detail + here. Please refer to the following URLs and other resources for + further readings. + <itemizedlist> + <listitem> + <para> + Unicode/UTF/UCS/etc + </para> + <para> + <literal>http://www.unicode.org/</literal> + </para> + </listitem> + <listitem> + <para> + Japanese/Korean/Chinese character + information + </para> + <para> + <literal> + ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf + </literal> + </para> + </listitem> + </itemizedlist> + </para> + </sect2> + </sect1> </partintro> + <refentry id="function.mb-language"> + <refnamediv> + <refname>mb_language</refname> + <refpurpose> + Set/Get current language + </refpurpose> + </refnamediv> + <refsect1> + <title>Description</title> + <funcsynopsis> + <funcprototype> + <funcdef>string + <function>mb_language</function></funcdef> + <paramdef>string + <parameter><optional>language</optional></parameter></paramdef> + </funcprototype> + </funcsynopsis> + <para> + <function>mb_language</function> sets language. If + <parameter>language</parameter> is omitted, it returns current + language as string. + </para> + <para> + <parameter>language</parameter> setting is used for encoding + e-mail messages. Valid languages are "Japanese", + "ja","English","en" and "uni" + (UTF-8). <function>mb_send_mail</function> uses this setting to + encode e-mail. + </para> + <para> Language and its setting is ISO-2022-JP/Base64 for + Japanese, UTF-8/Base64 for uni, ISO-8859-1/quoted printable for + English. + </para> + <para> + Return Value: If <parameter>language</parameter> is set and + <parameter>language</parameter> is valid, it returns + TRUE. Otherwise, it returns FALSE. When + <parameter>language</parameter> is omitted, it returns language + name as string. If no language is set previously, it returns + FALSE. + </para> + <para> + See also <function>mb_send_mail</function>. + </para> + </refsect1> + </refentry> + + <refentry id="function.mb-parse-str"> + <refnamediv> + <refname>mb_parse_str</refname> + <refpurpose> + Parse GET/POST/COOKIE data and set global variable + </refpurpose> + </refnamediv> + <refsect1> + <title>Description</title> + <funcsynopsis> + <funcprototype> + <funcdef>string + <function>mb_parse_str</function> + </funcdef> + <paramdef>string + <parameter>encoded_string</parameter> + </paramdef> + <paramdef>array + <parameter><optional>result</optional></parameter> + </paramdef> + </funcprototype> + </funcsynopsis> + <para> + <function>mb_parse_str</function> parses GET/POST/COOKIE data and + sets global variables. Since PHP does not provide raw POST/COOKIE + data, it can only used for GET data for now. It preses URL + encoded data, detects encoding, converts coding to internal + encoding and set values to <parameter>result</parameter> array or + global variables. + </para> + <para> + <parameter>encoded_string</parameter>: URL encoded data. + </para> + <para> + <parameter>result</parameter>: Array contains decoded and + character encoding converted values. + </para> + <para> + Return Value: It returns TRUE for success or FALSE for failure. + </para> + <para> + See also <function>mb_detect_order</function>, + <function>mb_internal_encoding</function>. + </para> + </refsect1> + </refentry> + <refentry id="function.mb-internal-encoding"> <refnamediv> <refname>mb_internal_encoding</refname> @@ -211,7 +647,7 @@ <parameter>encoding</parameter>: Character encoding name </para> <para> - Return Value: If encoding is + Return Value: If <parameter>encoding</parameter> is set,<function>mb_internal_encoding</function> returns <literal>TRUE</literal> for success, otherwise returns <literal>FALSE</literal>. If <parameter>encoding</parameter> is @@ -232,7 +668,7 @@ <para> See also <function>mb_http_input</function>, <function>mb_http_output</function>, - <function>mb_detect_order</function> + <function>mb_detect_order</function>. </para> </refsect1> </refentry> @@ -270,7 +706,7 @@ <para> See also <function>mb_internal_encoding</function>, <function>mb_http_output</function>, - <function>mb_detect_order</function> + <function>mb_detect_order</function>. </para> </refsect1> </refentry> @@ -294,9 +730,10 @@ If <parameter>encoding</parameter> is set, <function>mb_http_output</function> sets HTTP output character encoding to <parameter>encoding</parameter>. Output after this - function is converted to <parameter>encoding</parameter>. - <function>mb_http_output</function> returns TRUE for success and - FALSE for failure. + function is converted to <parameter>encoding</parameter>. + <function>mb_http_output</function> returns + <literal>TRUE</literal> for success and <literal>FALSE</literal> + for failure. </para> <para> If <parameter>encoding</parameter> is omitted, @@ -306,7 +743,7 @@ <para> See also <function>mb_internal_encoding</function>, <function>mb_http_input</function>, - <function>mb_detect_order</function> + <function>mb_detect_order</function>. </para> </refsect1> </refentry> @@ -331,11 +768,12 @@ <para> <function>mb_detect_order</function> sets automatic character encoding detection order to <parameter>encoding-list</parameter>. - It returns TRUE for success, FALSE for failure. + It returns <literal>TRUE</literal> for success, + <literal>FALSE</literal> for failure. </para> <para> <parameter>encoding-list</parameter> is array or comma separated - list of character encodings. ("auto" is expanded to + list of character encoding. ("auto" is expanded to "ASCII, JIS, UTF-8, EUC-JP, SJIS") </para> <para> @@ -346,6 +784,42 @@ This setting affects <function>mb_detect_encoding</function> and <function>mb_send_mail</function>. </para> + <note> + <para> + <literal>mbstring</literal> currently implements following + encoding detection filters. If there is a invalid byte sequence + for following encoding, encoding detection will fail. + </para> + <simpara> + <literal>UTF-8</literal>, <literal>UTF-7</literal>, + <literal>ASCII</literal>, + <literal>EUC-JP</literal>,<literal>SJIS</literal>, + <literal>eucJP-win</literal>, <literal>SJIS-win</literal>, + <literal>JIS</literal>, <literal>ISO-2022-JP</literal> + </simpara> + <para> + For <literal>ISO-8859-*</literal>, <literal>mbstring</literal> + always detects as <literal>ISO-8859-*</literal>. + </para> + <para> + For <literal>UTF-16</literal>, <literal>UTF-32</literal>, + <literal>UCS2</literal> and <literal>UCS4</literal>, encoding + detection will fail always. + </para> + <para> + <example> + <title>Useless detect order example</title> + <programlisting> +; Always detect as ISO-8859-1 +detect_order = ISO-8859-1, UTF-8 + +; Always detect as UTF-8, since ASCII/UTF-7 values are +; valid for UTF-8 +detect_order = UTF-8, ASCII, UTF-7 + </programlisting> + </example> + </para> + </note> <para> <example> <title><function>mb_detect_order</function> examples</title> @@ -368,7 +842,7 @@ See also <function>mb_internal_encoding</function>, <function>mb_http_input</function>, <function>mb_http_output</function> - <function>mb_send_mail</function> + <function>mb_send_mail</function>. </para> </refsect1> </refentry> @@ -393,7 +867,7 @@ substitution character when input character encoding is invalid or character code is not exist in output character encoding. Invalid characters may be substituted null(no output), - string or hex value (Unicode character code value). + string or integer value (Unicode character code value). </para> <para> This setting affects <function>mb_detect_encoding</function> @@ -410,16 +884,17 @@ </listitem> <listitem> <simpara> - "long" : Output hex value (Example: U+3000,JIS+7E7E) + "long" : Output character code value (Example: + U+3000,JIS+7E7E) </simpara> </listitem> </itemizedlist> </para> <para> Return Value: If <parameter>substchar</parameter> is set, it - returns TRUE for success, otherwise returns FALSE. If - <parameter>substchar</parameter> is not set, it returns Unicode - value or + returns <literal>TRUE</literal> for success, otherwise returns + <literal>FALSE</literal>. If <parameter>substchar</parameter> is + not set, it returns Unicode value or "<literal>none</literal>"/"<literal>long</literal>". </para> <para> @@ -461,9 +936,29 @@ <function>ob_start</function> callback function. <function>mb_output_handler</function> converts characters in output buffer from internal character encoding to - HTTP output character encoding. + HTTP output character encoding. + </para> + <para> + 4.0.7 or later version, this hanlder adds charset HTTP header + when following conditions are met: </para> <para> + <itemizedlist> + <listitem> + <simpara>Does not set <literal>Content-Type</literal> by + header()</simpara> + </listitem> + <listitem> + <simpara>Default MIME type begins with + <literal>text/</literal></simpara> + </listitem> + <listitem> + <simpara><literal>http_output</literal> setting is other than + pass</simpara> + </listitem> + </itemizedlist> + </para> + <para> <parameter>contents</parameter> : Output buffer contents </para> <para> @@ -483,8 +978,8 @@ </para> <note> <para> - If you want to output some binary data such as image from php - script, you must set output encoding to "pass" using + If you want to output some binary data such as image from PHP + script, you must set output encoding to "pass" using <function>mb_http_output</function>. </para> </note> @@ -520,7 +1015,7 @@ $outputenc = "sjis-win"; mb_http_output($outputenc); ob_start("mb_output_handler"); -Header("Content-Type: text/html; charset=" . mb_preferred_mime_name($outputenc)); +header("Content-Type: text/html; charset=" . mb_preferred_mime_name($outputenc)); </programlisting> </example> </para> @@ -550,6 +1045,11 @@ counted as 1. </para> <para> + <parameter>encoding</parameter> is character encoding for + <parameter>str</parameter>. If <parameter>encoding</parameter> is + omitted, internal character encoding is used. + </para> + <para> See also <function>mb_internal_encoding</function>, <function>strlen</function>. </para> @@ -567,7 +1067,7 @@ <title>Description</title> <funcsynopsis> <funcprototype> - <funcdef>string <function>mb_strpos</function></funcdef> + <funcdef>int <function>mb_strpos</function></funcdef> <paramdef>string <parameter>haystack</parameter></paramdef> <paramdef>string <parameter>needle</parameter></paramdef> <paramdef>int @@ -605,7 +1105,7 @@ </para> <para> <parameter>encoding</parameter> is character encoding name. If it - is not specified, internal character encoding is used. + is omitted, internal character encoding is used. </para> <para> See also <function>mb_strpos</function>, @@ -626,7 +1126,7 @@ <title>Description</title> <funcsynopsis> <funcprototype> - <funcdef>string <function>mb_strrpos</function></funcdef> + <funcdef>int <function>mb_strrpos</function></funcdef> <paramdef>string <parameter>haystack</parameter></paramdef> <paramdef>string <parameter>needle</parameter></paramdef> <paramdef>string @@ -649,7 +1149,7 @@ 0. Second character position is 1. </para> <para> - If <parameter>encoding</parameter> is not set, internal encoding + If <parameter>encoding</parameter> is omitted, internal encoding is assumed. <function>mb_strrpos</function> accepts <literal>string</literal> for <parameter>needle</parameter> where <function>strrpos</function> accepts only character. @@ -709,7 +1209,7 @@ omitted, internal character encoding is used. </para> <para> - See also <function>mb_struct</function>, + See also <function>mb_strcut</function>, <function>mb_internal_encoding</function>. </para> </refsect1> @@ -822,7 +1322,7 @@ <title>Description</title> <funcsynopsis> <funcprototype> - <funcdef>string <function>mb_strmwidth</function></funcdef> + <funcdef>string <function>mb_strimwidth</function></funcdef> <paramdef>string <parameter>str</parameter></paramdef> <paramdef>int <parameter>start</parameter></paramdef> <paramdef>int <parameter>width</parameter></paramdef> @@ -833,7 +1333,7 @@ </funcprototype> </funcsynopsis> <para> - <function>mb_strmwidth</function> truncates string + <function>mb_strimwidth</function> truncates string <parameter>str</parameter> to specified <parameter>width</parameter>. It returns truncated string. </para> @@ -1164,6 +1664,12 @@ before conversion for success, FALSE for failure. </para> <para> + <function>mb_convert_variables</function> join strings in Array + or Object to detect encoding, since encoding detection tends to + fail for short strings. Therefore, it is impossible to mix + encoding in single array or object. + </para> + <para> It <parameter>from-encoding</parameter> is specified by array or comma separated string, it tries to detect encoding from <parameter>from-coding</parameter>. When @@ -1172,7 +1678,9 @@ </para> <para> <parameter>vars (3rd and larger)</parameter> is reference to - variable to be converted. String, Array and Object are accepted. + variable to be converted. String, Array and Object are accepted. + <function>mb_convert_variables</function> assumes all parameters + have the same encoding. </para> <para> <example> @@ -1296,7 +1804,8 @@ convert. </para> <para> - <parameter>encoding</parameter> is character encoding. + <parameter>encoding</parameter> is character encoding. If it is + omitted, internal character encoding is used. </para> <para> <example> @@ -1323,7 +1832,7 @@ <refnamediv> <refname>mb_send_mail</refname> <refpurpose> - Send mail with ISO-2022-JP character code. (Japanese specific) + Send encoded mail. </refpurpose> </refnamediv> <refsect1> @@ -1344,7 +1853,8 @@ </funcsynopsis> <para> <function>mb_send_mail</function> sends email. Headers and - message are converted and encoded in ISO-2022-JP. + message are converted and encoded according to + <function>mb_language</function> setting. <function>mb_send_mail</function> is wrapper function of <function>mail</function>. See <function>mail</function> for details. @@ -1361,21 +1871,23 @@ <parameter>message</parameter> is mail message. </para> <para> - string <parameter>additional_headers</parameter> is inserted at - the end of the header. This is typically used to add - extra headers. Multiple extra headers are separated with a + <parameter>additional_headers</parameter> is inserted at + the end of the header. This is typically used to add extra + headers. Multiple extra headers are separated with a newline(\n). </para> <para> - It returns TRUE for success, otherwise it returns FALSE. + <parameter>additional_parameter</parameter> is a MTA command line + parameter. It is useful when setting the correct Return-Path + header when using sendmail. </para> <para> - <parameter>additional_parameter</parameter> is added this - data to the call to the mailer by PHP. This is useful when - setting the correct Return-Path header when using sendmail. + It returns <literal>TRUE</literal> for success, otherwise it + returns <literal>FALSE</literal>. </para> <para> - See also: <function>mail</function>. + See also: <function>mb_language</function>, + <function>mail</function>. </para> </refsect1> </refentry>