php-i18n Digest 24 Jul 2002 02:30:08 -0000 Issue 115

Topics (messages 288 through 295):

Re: mbstring: Japanese: encoding conversion not
        288 by: Lew Mark-Andrews
        289 by: Yasuo Ohgaki
        290 by: Jean-Christian Imbeault
        291 by: Jean-Christian Imbeault
        292 by: Jeff Bailey
        293 by: Yasuo Ohgaki
        294 by: Yasuo Ohgaki

htmlentities() and charset
        295 by: a.h.s. boy

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [EMAIL PROTECTED]


----------------------------------------------------------------------
--- Begin Message ---
>I would have thought that ./configure should
>have complained that I had passed it an invalid flag ...
Apparently not. A couple of months ago the people from Sun who provide
unofficial upgrade packages for Cobalt Raq servers inadvertently compiled
PHP using "with" instead of "enable" for the mbstring config options. For
anyone who installed from this package, mbstring stuff didn't work at all,
of course, though PHP performed correctly otherwise. To their credit, a
properly compiled fix was made available for download soon after they were
alerted. But since this was a recent 4.1.2 update, I wouldn't be surprised
if, without knowledge or feedback otherwise, there were still some
Cobalt-based virtual hosters out there thinking they're providing mbstring
support to their customers.

As I also have a deep interest in Japanese-related PHP matters, I'd like to
say thanks guys for providing these insights and information. This is the
most legitimate traffic this list has had in months!

Lew


--- End Message ---
--- Begin Message ---
Jean-Christian Imbeault wrote:
> David Emery wrote:
> 
>>
>> I found this...
>>
>> You have default_charset set to EUC-JP. It should be Shift_JIS. PHP 
>> will set the outgoing headers to this value (that's what it's for)
> 
> 
> 
> So what is the difference between mbstring.internal_encoding and 
> mbstring.http_output?

I suppose you've read the manual pages... I may really need to
add more description...

Anyway, internal_encoding is SCRIPT encoding. You should
use EUC or UTF-8. You cannot use SJIS as internal ecoding
unless you are using patched PHP.

# There are sevral encoding that cannot be work as
# internal ecoding. Examples are BIG5, SJIS, ISO-2022.

Output encoding is encoding that is sent to browser/client.
If internal and output encoding differ, mb_output_handler
will convert encoding from internal encoding to output encoding.

When you are using mbstring, don't set default charset php.ini
directive, mbstring set it to appropriate encoding accoding
to output encoding setting.

--
Yasuo Ohgaki


--- End Message ---
--- Begin Message ---
Yasuo Ohgaki wrote:

>
>> So what is the difference between mbstring.internal_encoding and 
>> mbstring.http_output?
> 
> 
> I suppose you've read the manual pages... I may really need to
> add more description...


Yes, I read them over and over :) Are you in charge of maintaning the 
manual pages for mbstring? If so I would glad lend you a hand in making 
the clearer and more informative. If you want my help just ask. I'll be 
happy to pitch in!

 
> Anyway, internal_encoding is SCRIPT encoding.


You mean the encoding the php script file is in?

> Output encoding is encoding that is sent to browser/client.
> If internal and output encoding differ, mb_output_handler
> will convert encoding from internal encoding to output encoding.


Ok. But this conversion will only occur if mb_output_handler has been 
set in php.ini as the output handler, correct?

 
> When you are using mbstring, don't set default charset php.ini
> directive, mbstring set it to appropriate encoding accoding
> to output encoding setting.


Ah ... that should be mentionned in the manual pages somewhere ;)

Jc

--- End Message ---
--- Begin Message ---
Lew Mark-Andrews wrote:

> 
> As I also have a deep interest in Japanese-related PHP matters, I'd like to
> say thanks guys for providing these insights and information. This is the
> most legitimate traffic this list has had in months!


This thread has been most informative! I tried many other venues looking 
for help on my mbstring woes, with no luck, and luckily someone on this 
list (thanks Dave!) helped me out a great deal!

If Yasuo needs any help with  editing the manual pages I will gladly 
lend a hand. My way to finally give back to the community!

jc

--- End Message ---
--- Begin Message ---
It's very nice to see some activity on here from this end of the mailing
list as well.  I am on the verge of putting together a php/mysql
multilingual website from scratch and it's good to know that some people
are reading up on the list.  I was wondering what people's opinion on
the overall structure of a multi language php application is.  There are
a number of ways to implement the frontend.

1.  Do something similar to the way phpMyAdmin works and have different
files layed out with the different encoding types and variables listed
for each message displayed to the users.

2.  Use gettext or some other widely used standard.

..... Etc..

Of course even these 2 solutions have their good and bad points.  The
first is quick but harder to maintain.  The second of course is easier
to maintain but more difficult to setup from the beginning.  What do
others think or what are they using at this time?

Thanks,
-jeff
 


-----Original Message-----
From: Jean-Christian Imbeault [mailto:[EMAIL PROTECTED]] 
Sent: Thursday, July 11, 2002 1:48 PM
To: [EMAIL PROTECTED]
Subject: Re: [PHP-I18N] Re: mbstring: Japanese: encoding conversion not


Lew Mark-Andrews wrote:

> 
> As I also have a deep interest in Japanese-related PHP matters, I'd 
> like to say thanks guys for providing these insights and information. 
> This is the most legitimate traffic this list has had in months!


This thread has been most informative! I tried many other venues looking

for help on my mbstring woes, with no luck, and luckily someone on this 
list (thanks Dave!) helped me out a great deal!

If Yasuo needs any help with  editing the manual pages I will gladly 
lend a hand. My way to finally give back to the community!

jc


-- 
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




--- End Message ---
--- Begin Message ---
Jean-Christian Imbeault wrote:
> Yasuo Ohgaki wrote:
> 
>>
>>> So what is the difference between mbstring.internal_encoding and 
>>> mbstring.http_output?
>>
>>
>>
>> I suppose you've read the manual pages... I may really need to
>> add more description...
> 
> 
> 
> Yes, I read them over and over :) Are you in charge of maintaning the 
> manual pages for mbstring? If so I would glad lend you a hand in making 
> the clearer and more informative. If you want my help just ask. I'll be 
> happy to pitch in!

Descriptions/fixes to manual pages are welcome.
CVS diff would be nice, but it also ok to send
text if you are not familar with DocBook.

> 
> 
>> Anyway, internal_encoding is SCRIPT encoding.
> 
> You mean the encoding the php script file is in?

Yes.

> 
>> Output encoding is encoding that is sent to browser/client.
>> If internal and output encoding differ, mb_output_handler
>> will convert encoding from internal encoding to output encoding.
> 
> 
> 
> Ok. But this conversion will only occur if mb_output_handler has been 
> set in php.ini as the output handler, correct?

Correct.

> 
> 
>> When you are using mbstring, don't set default charset php.ini
>> directive, mbstring set it to appropriate encoding accoding
>> to output encoding setting.
> 
> 
> 
> Ah ... that should be mentionned in the manual pages somewhere ;)
> 

I figured out that a while ago.
I didn't have time to update manual...

--
Yasuo Ohgaki

--- End Message ---
--- Begin Message ---
Jean-Christian Imbeault wrote:
> David Emery wrote:
> 
>>
> 
>> There's more, and this is the biggie...
>>
>> '--enable-mbstring-enc-trans' should be '--enable-mbstr-enc-trans'
> 
> 
> 
> That fixed most of my problems! Thanks!
> 
> Now I just have a question concerning the use of "internal encoding".
> 
> When I receive $test it is in EUC-JP (because I have internal encoding 
> set to EUC-JP?). If I echo $test back to the browser it comes out as 
> SJIS. This is all good.
> 
> But if I do this:
> 
> echo(mb_convert_encoding($test, "SJIS","EUC-JP"));
> 
> I get mojibake. Why? $test is internally encoded in EUC-JP and I want to 
> spew it back out as SJIS and I have mbstring.http_output set to SJIS, so 
> why won't it print properly?

If you would like to output SJIS when you are using EUC-JP as internal
encoding, you shouldn't try to output SJIS encoding to browser.
It will result in mojibake.

Only use EUC-JP. Output buffer mechanism will buffer all output
, including EUC-JP encoding text, then convert it SJIS in your
setting.

Use mb_convert_encoding and mb_convert_variables when you are
reading text file or like. (And multipart/form-data, since
mbstring will not try to convert encoding automatically when
multipart/form-data form encoding is used)

--
Yasuo Ohgaki

> Thanks for all the help so far! Things seem to be working fine now. It's 
> just my understanding that is a flaky I think. If I can get to 
> understand the purpose/use of the settings and functions it will go a 
> long way in preventing future errors on my part ^_^
> 
> Jc
> 
> 


--- End Message ---
--- Begin Message ---
I've been trying to internationalize a rather large PHP-based app that 
I'm working on. I implemented gettext() to cover some 650+ static 
strings in the code, and that aspect seems to work fine. I am now trying 
to handle issues of alternate character sets, but htmlentities() seems 
to go wonky when I add the charset parameter.

When I started the internationalization, I was (romanocentrically) 
focused on providing support for other "Latin 1" languages. The first 
alternate language request I received, however, was for Greek and 
Turkish support. Go figure.

To accomodate the non-Latin 1 characters, I made the following changes:

1) I modified my code so that switching displayed languages set the 
appropriate
      <meta http-equiv="content-type"...charset=...>
header for the page.

2) I modified my submission forms to have an "Accept-charset" parameter 
that included all the supported language character sets.

3) I changed a text scrubbing function (to clean up curly quotes, em 
dashes, etc) so that it wouldn't interfere with non-Latin 1 character 
sets.

4) I converted my calls to "htmlentities" to include the 3rd parameter, 
specifying the charset.

This last change, however, doesn't seem to be working. To complicate 
things, I'm attempting to test Greek language support from my Mac OS X 
machine, which isn't fully configured for Greek input. The OmniWeb 
browser I use supports display of Greek websites, so I've been cutting 
and pasting text from those sites into my submission form to test it.

If I submit the Greek text, it goes into my MySQL database just fine. To 
display it, I'm calling

    nl2br(htmlentities($body,ENT_COMPAT,'ISO-8859-7'));

and the result in my browser is

Υπάρχουν ιστορικά...

(Random Latin 1 accented characters, not Greek).

If, however, I change the htmlentities() call to htmlspecialchars(), 
still with the character set specified, it renders properly in my 
browser, in Greek.

Is this a bug in htmlentities? I'm using PHP 4.2.2, just released, so if 
it's a bug, it hasn't been fixed yet.

Can anyone else confirm this? Is anyone else attempting 
internationalized Greek support and attempting to use htmlentities()?

To see a page with both htmlentities() and htmlspecialchars() used, go to
http://alt.baltimoreimc.org/newswire/display/49/index.php

And feel free to attempt posting your own submissions...it's a 
development server, so no harm done. Use the "language" popup on the 
left-hand side to "switch" languages. The gettext translations aren't 
done, so that won't have the desired effect, but it does switch the 
charset declaration in the page headers.

You are especially welcome to try this if you're Greek, and can input 
text "in the way a normal Greek computer would". ;-)

Cheers,
spud.

-------------------------------------------------------------------
a.h.s. boy
spud(at)nothingness.org            "as yes is to if,love is to yes"
http://www.nothingness.org/
-------------------------------------------------------------------

--- End Message ---

Reply via email to