php-i18n Digest 18 May 2004 09:25:31 -0000 Issue 228

Topics (messages 698 through 701):

Re: Converting "\u00F4" style characters
        698 by: Asgeir Frimannsson
        700 by: Michael Wallner
        701 by: Michael Wallner

Re: Particular Problem Kanji
        699 by: PHPDiscuss - PHP Newsgroups and mailing lists

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [EMAIL PROTECTED]


----------------------------------------------------------------------
--- Begin Message --- Michael Wallner wrote:
Hi,

I'm the author of PEARs I18Nv2 [1] module to which maintainance I kind of skittered into because I wanted to introduce a Win32/Linux independent setLocale().

Currently I want to utilze IBMs ICU resources [2] which encode unicode characters in the "\u00F4" way.

I already wrote a parser [3] for the ICU files, but searching the web I didn't find a cute way to convert these "\u00F4" characters.

Can anybody help out?

Thanks a lot,
mike

[1] http://pear.php.net/package/I18Nv2
[2] http://oss.software.ibm.com/cvs/icu/icu/source/data/locales/
[3] http://cvs.php.net/co.php/pear/I18Nv2/OpenI18N/ICUParser.php


Hi Mike,

Don't know if this is exactly what you're after, but the following example converts hex unicode, eg "00F4" (strip the "\u") to utf8:

<?php
function unicode_to_utf8( $unicode_hex ) {
        
        $unicode = hexdec($unicode_hex);

$utf8 = '';

if ( $unicode < 128 ) {

$utf8 = chr( $unicode );

} elseif ( $unicode < 2048 ) {

$utf8 .= chr( 192 + ( ( $unicode - ( $unicode % 64 ) ) / 64 ) );
$utf8 .= chr( 128 + ( $unicode % 64 ) );

} else {

$utf8 .= chr( 224 + ( ( $unicode - ( $unicode % 4096 ) ) / 4096 ) );
$utf8 .= chr( 128 + ( ( ( $unicode % 4096 ) - ( $unicode % 64 ) ) / 64 ) );
$utf8 .= chr( 128 + ( $unicode % 64 ) );

} // if

        return $utf8;

} // unicode_to_utf8
        
        
header('Content-Type: text/plain; charset=utf8');

$ch4 = '0034'; // digit '4'
$chA = '0041'; // char 'A'
$utf4 = unicode_to_utf8($ch4);
$utfA = unicode_to_utf8($chA);

print $utf4 ."\n" . $utfA;
?>

See
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
- unicode mapping table

http://www.randomchaos.com/document.php?source=php_and_unicode
- where the unicode_to_utf8 method is inspired from...

regards,
asgeir

--- End Message ---
--- Begin Message ---
Heh Asgeir, nice to meet you again!

> Michael Wallner wrote:
> 
>> Currently I want to utilze IBMs ICU resources [2] which encode unicode
>> characters in the "\u00F4" way.

> Hi Mike,
> 
> Don't know if this is exactly what you're after, but the following
> example converts hex unicode, eg "00F4" (strip the "\u") to utf8:
> [...]

Well actually it *is* :)

Thanks a *lot*,
-- 
Michael - < mike(@)php.net >

Attachment: signature.asc
Description: OpenPGP digital signature


--- End Message ---
--- Begin Message ---
Hi Asgeir Frimannsson, you wrote:

> Hi Mike,
> 
> Don't know if this is exactly what you're after, but the following
> example converts hex unicode, eg "00F4" (strip the "\u") to utf8:
> 
> <?php
[...]
> $ch4 = '0034'; // digit '4'
> $chA = '0041'; // char 'A'
> $utf4 = unicode_to_utf8($ch4);
> $utfA = unicode_to_utf8($chA);
> 
> print $utf4 ."\n" . $utfA;
> ?>

Hi Asgeir,

it seems that the function doesn't work with higher
values like "\u20AC" which is the EUR symbol :-(

Any ideas?

Many thanks,
-- 
Michael - < mike(@)php.net >

Attachment: signature.asc
Description: OpenPGP digital signature


--- End Message ---
--- Begin Message ---
Claire Hector wrote:

> David,
> Thanks for your reply!

> I am aware that SJIS shouldn't be used for internal coding, however the
config
> of the server is out of my control and the person who has set it up
determined
> it was the best setup. It's his rice field ;-) .... i pointed out the very
> clear warnings from the php site and he showed me a japanese language site
> that basically justified this setup, so when he wouldn't budge I went and
> changed the config settings in both php and mysql myself to see if this would
> fix the problem thus I had:

> Each page with shift_JIS encoding.

> mbstring.language=Japanese
> mbstring.internal_encoding=EUC-JP
> mbsting.http_output=SJIS

> MySQL table charater set ujis

> and Apache using an additional module for Japanese from webDAV called
> mod_encoding
> with ServerEncoding set to utf-8 & DefaultClientEncoding as JA-AUTO-SJIS-MS

> with this setup I had real mojibake - none of the japanese on the site
> resembled kanji....

> I have not encountered the webDAV module before and am wondering if this
could
> be causing problems. Our network guy said that he needed to use this in order
> to get the settings all working, just configuring php as he had also gave
> mojibake problems!!

> It is quite a frustrating problem as it only effects a few kanji and only
once
> they have been sent to mySQL.

> Any other ideas??

> Cheers,
> Claire

> David Emery wrote:

> > 2004/04/06 ($B2P(B) 11:58 $B$K(B Claire Hector $B$5$s$O=q$-$^$7$?(B:
> > > Hello!
> > > I have a question regarding some particular Japanese words.
> > >
> > > I have a MySQL database and have set the character set for selected
> > > tables to sjis (I have also tried this with ujis and various php
> > > settings)
> > >
> > > php.ini settings for mbstring are as follows:
> > >
> > >      mbstring.language=Japanese
> > >      mbstring.internal_encoding=SJIS
> >
> > You shouldn't use SJIS for internal encoding or in your DB, or inside
> > your PHP. It's evil and will cause exactly the type of problems you're
> > having. I think the mbstring docs explain this. Setting
> > mbstring.internal_encoding to EUC would be better. UTF is good too, but
> > I don't think it's supported by MySQL.
> >
> > There should be an mbstring.http_output setting as well, which you
> > probably want to set to SJIS.
> >
> > >      mbstring.http_input=auto
> > >      mbstring.http_input=UTF-8
> > >      mbstring.encoding_translation=on
> > >      mbstring.detect_order=auto
> > >      mbstring.substitute_character=none
> > >
> > > [the server was configured by our network person so i am not 100% sure
> > > of the reasoning behind these particular settings.]
> > >
> > > The server is Apache using an additional module for Japanese from webDAV
> > > called mod_encoding
> > > with ServerEncoding set to utf-8 & DefaultClientEncoding as
> > > JA-AUTO-SJIS-MS
> > >
> > > Each page is encoded as shift_jis. (i have also played around with these
> > > and the get the same problems if this is changes to utf-8 or euc-jp)
> > >
> > > Generally, the html pages display Japanese without problem, however
> > > there are a couple of particular kanji that do not display properly.
> > > When entered into a html form they look fine, when the query the data is
> > > used for is echoed back to the screen all is fine, but once they are
> > > actually entered into the MySQL database they change.
> > >
> > > Examples of problem kanji...
> > > yo-so-ku -> when this is stored in the database it changes to the kanji
> > > for egg and a small katakana i...
> > > yo-so-u  -> when this is stored in the database it changes to the kanji
> > > for egg and a small z...
> > > also
> > > ko-u-chi-ku
> > > ju-u-bu-n
> > > hyo-u-to
> > > hyo-u-sho-u
> > >
> > > Can this be changed by altering the encoding scheme chosen or are these
> > > particular problem kanji and should just be avoided?
> > >
> > >
> > > I would really appreciate your expertise in helping me make sense of
> > > this.
> > > Thanks,
> > > Claire
> > --
> > -dave
> >
> > --
> > PHP Internationalization Mailing List (http://www.php.net/)
> > To unsubscribe, visit: http://www.php.net/unsub.php

> --

> ------------------------------------------------
> $B>>2<EE9)3t<02q<R(B
> $BIJ<AIt!!IJ<A4k2h?d?J%0%k!<%W(B
> $B%/%l%"!!%X%/%?!<(B

> Claire Hector
> Matsushita Electric Works Ltd.
> Quality Management System Group
> Corporate Quality Management

> E-mail: [EMAIL PROTECTED]
> Notes: [EMAIL PROTECTED]

> MIC:7-711-2470   [EMAIL PROTECTED](Bphone$B!K!'(B+81-6-6908-6803
> FAX:7-711-2479   [EMAIL PROTECTED](BFAX$B!K(B  $B!'(B+81-6-6906-2202
> ------------------------------------------------

--- End Message ---

Reply via email to