php-i18n Digest 24 Jul 2002 02:30:08 -0000 Issue 115
Topics (messages 288 through 295):
Re: mbstring: Japanese: encoding conversion not
288 by: Lew Mark-Andrews
289 by: Yasuo Ohgaki
290 by: Jean-Christian Imbeault
291 by: Jean-Christian Imbeault
292 by: Jeff Bailey
293 by: Yasuo Ohgaki
294 by: Yasuo Ohgaki
htmlentities() and charset
295 by: a.h.s. boy
Administrivia:
To subscribe to the digest, e-mail:
[EMAIL PROTECTED]
To unsubscribe from the digest, e-mail:
[EMAIL PROTECTED]
To post to the list, e-mail:
[EMAIL PROTECTED]
----------------------------------------------------------------------
--- Begin Message ---
>I would have thought that ./configure should
>have complained that I had passed it an invalid flag ...
Apparently not. A couple of months ago the people from Sun who provide
unofficial upgrade packages for Cobalt Raq servers inadvertently compiled
PHP using "with" instead of "enable" for the mbstring config options. For
anyone who installed from this package, mbstring stuff didn't work at all,
of course, though PHP performed correctly otherwise. To their credit, a
properly compiled fix was made available for download soon after they were
alerted. But since this was a recent 4.1.2 update, I wouldn't be surprised
if, without knowledge or feedback otherwise, there were still some
Cobalt-based virtual hosters out there thinking they're providing mbstring
support to their customers.
As I also have a deep interest in Japanese-related PHP matters, I'd like to
say thanks guys for providing these insights and information. This is the
most legitimate traffic this list has had in months!
Lew
--- End Message ---
--- Begin Message ---
Jean-Christian Imbeault wrote:
> David Emery wrote:
>
>>
>> I found this...
>>
>> You have default_charset set to EUC-JP. It should be Shift_JIS. PHP
>> will set the outgoing headers to this value (that's what it's for)
>
>
>
> So what is the difference between mbstring.internal_encoding and
> mbstring.http_output?
I suppose you've read the manual pages... I may really need to
add more description...
Anyway, internal_encoding is SCRIPT encoding. You should
use EUC or UTF-8. You cannot use SJIS as internal ecoding
unless you are using patched PHP.
# There are sevral encoding that cannot be work as
# internal ecoding. Examples are BIG5, SJIS, ISO-2022.
Output encoding is encoding that is sent to browser/client.
If internal and output encoding differ, mb_output_handler
will convert encoding from internal encoding to output encoding.
When you are using mbstring, don't set default charset php.ini
directive, mbstring set it to appropriate encoding accoding
to output encoding setting.
--
Yasuo Ohgaki
--- End Message ---
--- Begin Message ---
Yasuo Ohgaki wrote:
>
>> So what is the difference between mbstring.internal_encoding and
>> mbstring.http_output?
>
>
> I suppose you've read the manual pages... I may really need to
> add more description...
Yes, I read them over and over :) Are you in charge of maintaning the
manual pages for mbstring? If so I would glad lend you a hand in making
the clearer and more informative. If you want my help just ask. I'll be
happy to pitch in!
> Anyway, internal_encoding is SCRIPT encoding.
You mean the encoding the php script file is in?
> Output encoding is encoding that is sent to browser/client.
> If internal and output encoding differ, mb_output_handler
> will convert encoding from internal encoding to output encoding.
Ok. But this conversion will only occur if mb_output_handler has been
set in php.ini as the output handler, correct?
> When you are using mbstring, don't set default charset php.ini
> directive, mbstring set it to appropriate encoding accoding
> to output encoding setting.
Ah ... that should be mentionned in the manual pages somewhere ;)
Jc
--- End Message ---
--- Begin Message ---
Lew Mark-Andrews wrote:
>
> As I also have a deep interest in Japanese-related PHP matters, I'd like to
> say thanks guys for providing these insights and information. This is the
> most legitimate traffic this list has had in months!
This thread has been most informative! I tried many other venues looking
for help on my mbstring woes, with no luck, and luckily someone on this
list (thanks Dave!) helped me out a great deal!
If Yasuo needs any help with editing the manual pages I will gladly
lend a hand. My way to finally give back to the community!
jc
--- End Message ---
--- Begin Message ---
It's very nice to see some activity on here from this end of the mailing
list as well. I am on the verge of putting together a php/mysql
multilingual website from scratch and it's good to know that some people
are reading up on the list. I was wondering what people's opinion on
the overall structure of a multi language php application is. There are
a number of ways to implement the frontend.
1. Do something similar to the way phpMyAdmin works and have different
files layed out with the different encoding types and variables listed
for each message displayed to the users.
2. Use gettext or some other widely used standard.
..... Etc..
Of course even these 2 solutions have their good and bad points. The
first is quick but harder to maintain. The second of course is easier
to maintain but more difficult to setup from the beginning. What do
others think or what are they using at this time?
Thanks,
-jeff
-----Original Message-----
From: Jean-Christian Imbeault [mailto:[EMAIL PROTECTED]]
Sent: Thursday, July 11, 2002 1:48 PM
To: [EMAIL PROTECTED]
Subject: Re: [PHP-I18N] Re: mbstring: Japanese: encoding conversion not
Lew Mark-Andrews wrote:
>
> As I also have a deep interest in Japanese-related PHP matters, I'd
> like to say thanks guys for providing these insights and information.
> This is the most legitimate traffic this list has had in months!
This thread has been most informative! I tried many other venues looking
for help on my mbstring woes, with no luck, and luckily someone on this
list (thanks Dave!) helped me out a great deal!
If Yasuo needs any help with editing the manual pages I will gladly
lend a hand. My way to finally give back to the community!
jc
--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--- End Message ---
--- Begin Message ---
Jean-Christian Imbeault wrote:
> Yasuo Ohgaki wrote:
>
>>
>>> So what is the difference between mbstring.internal_encoding and
>>> mbstring.http_output?
>>
>>
>>
>> I suppose you've read the manual pages... I may really need to
>> add more description...
>
>
>
> Yes, I read them over and over :) Are you in charge of maintaning the
> manual pages for mbstring? If so I would glad lend you a hand in making
> the clearer and more informative. If you want my help just ask. I'll be
> happy to pitch in!
Descriptions/fixes to manual pages are welcome.
CVS diff would be nice, but it also ok to send
text if you are not familar with DocBook.
>
>
>> Anyway, internal_encoding is SCRIPT encoding.
>
> You mean the encoding the php script file is in?
Yes.
>
>> Output encoding is encoding that is sent to browser/client.
>> If internal and output encoding differ, mb_output_handler
>> will convert encoding from internal encoding to output encoding.
>
>
>
> Ok. But this conversion will only occur if mb_output_handler has been
> set in php.ini as the output handler, correct?
Correct.
>
>
>> When you are using mbstring, don't set default charset php.ini
>> directive, mbstring set it to appropriate encoding accoding
>> to output encoding setting.
>
>
>
> Ah ... that should be mentionned in the manual pages somewhere ;)
>
I figured out that a while ago.
I didn't have time to update manual...
--
Yasuo Ohgaki
--- End Message ---
--- Begin Message ---
Jean-Christian Imbeault wrote:
> David Emery wrote:
>
>>
>
>> There's more, and this is the biggie...
>>
>> '--enable-mbstring-enc-trans' should be '--enable-mbstr-enc-trans'
>
>
>
> That fixed most of my problems! Thanks!
>
> Now I just have a question concerning the use of "internal encoding".
>
> When I receive $test it is in EUC-JP (because I have internal encoding
> set to EUC-JP?). If I echo $test back to the browser it comes out as
> SJIS. This is all good.
>
> But if I do this:
>
> echo(mb_convert_encoding($test, "SJIS","EUC-JP"));
>
> I get mojibake. Why? $test is internally encoded in EUC-JP and I want to
> spew it back out as SJIS and I have mbstring.http_output set to SJIS, so
> why won't it print properly?
If you would like to output SJIS when you are using EUC-JP as internal
encoding, you shouldn't try to output SJIS encoding to browser.
It will result in mojibake.
Only use EUC-JP. Output buffer mechanism will buffer all output
, including EUC-JP encoding text, then convert it SJIS in your
setting.
Use mb_convert_encoding and mb_convert_variables when you are
reading text file or like. (And multipart/form-data, since
mbstring will not try to convert encoding automatically when
multipart/form-data form encoding is used)
--
Yasuo Ohgaki
> Thanks for all the help so far! Things seem to be working fine now. It's
> just my understanding that is a flaky I think. If I can get to
> understand the purpose/use of the settings and functions it will go a
> long way in preventing future errors on my part ^_^
>
> Jc
>
>
--- End Message ---
--- Begin Message ---
I've been trying to internationalize a rather large PHP-based app that
I'm working on. I implemented gettext() to cover some 650+ static
strings in the code, and that aspect seems to work fine. I am now trying
to handle issues of alternate character sets, but htmlentities() seems
to go wonky when I add the charset parameter.
When I started the internationalization, I was (romanocentrically)
focused on providing support for other "Latin 1" languages. The first
alternate language request I received, however, was for Greek and
Turkish support. Go figure.
To accomodate the non-Latin 1 characters, I made the following changes:
1) I modified my code so that switching displayed languages set the
appropriate
<meta http-equiv="content-type"...charset=...>
header for the page.
2) I modified my submission forms to have an "Accept-charset" parameter
that included all the supported language character sets.
3) I changed a text scrubbing function (to clean up curly quotes, em
dashes, etc) so that it wouldn't interfere with non-Latin 1 character
sets.
4) I converted my calls to "htmlentities" to include the 3rd parameter,
specifying the charset.
This last change, however, doesn't seem to be working. To complicate
things, I'm attempting to test Greek language support from my Mac OS X
machine, which isn't fully configured for Greek input. The OmniWeb
browser I use supports display of Greek websites, so I've been cutting
and pasting text from those sites into my submission form to test it.
If I submit the Greek text, it goes into my MySQL database just fine. To
display it, I'm calling
nl2br(htmlentities($body,ENT_COMPAT,'ISO-8859-7'));
and the result in my browser is
Υπάρχουν ιστορικά...
(Random Latin 1 accented characters, not Greek).
If, however, I change the htmlentities() call to htmlspecialchars(),
still with the character set specified, it renders properly in my
browser, in Greek.
Is this a bug in htmlentities? I'm using PHP 4.2.2, just released, so if
it's a bug, it hasn't been fixed yet.
Can anyone else confirm this? Is anyone else attempting
internationalized Greek support and attempting to use htmlentities()?
To see a page with both htmlentities() and htmlspecialchars() used, go to
http://alt.baltimoreimc.org/newswire/display/49/index.php
And feel free to attempt posting your own submissions...it's a
development server, so no harm done. Use the "language" popup on the
left-hand side to "switch" languages. The gettext translations aren't
done, so that won't have the desired effect, but it does switch the
charset declaration in the page headers.
You are especially welcome to try this if you're Greek, and can input
text "in the way a normal Greek computer would". ;-)
Cheers,
spud.
-------------------------------------------------------------------
a.h.s. boy
spud(at)nothingness.org "as yes is to if,love is to yes"
http://www.nothingness.org/
-------------------------------------------------------------------
--- End Message ---