php-i18n Digest 18 May 2004 09:25:31 -0000 Issue 228
Topics (messages 698 through 701):
Re: Converting "\u00F4" style characters
698 by: Asgeir Frimannsson
700 by: Michael Wallner
701 by: Michael Wallner
Re: Particular Problem Kanji
699 by: PHPDiscuss - PHP Newsgroups and mailing lists
Administrivia:
To subscribe to the digest, e-mail:
[EMAIL PROTECTED]
To unsubscribe from the digest, e-mail:
[EMAIL PROTECTED]
To post to the list, e-mail:
[EMAIL PROTECTED]
----------------------------------------------------------------------
--- Begin Message ---
Michael Wallner wrote:
Hi,
I'm the author of PEARs I18Nv2 [1] module to which
maintainance I kind of skittered into because I wanted
to introduce a Win32/Linux independent setLocale().
Currently I want to utilze IBMs ICU resources [2]
which encode unicode characters in the "\u00F4" way.
I already wrote a parser [3] for the ICU files, but
searching the web I didn't find a cute way to
convert these "\u00F4" characters.
Can anybody help out?
Thanks a lot,
mike
[1] http://pear.php.net/package/I18Nv2
[2] http://oss.software.ibm.com/cvs/icu/icu/source/data/locales/
[3] http://cvs.php.net/co.php/pear/I18Nv2/OpenI18N/ICUParser.php
Hi Mike,
Don't know if this is exactly what you're after, but the following
example converts hex unicode, eg "00F4" (strip the "\u") to utf8:
<?php
function unicode_to_utf8( $unicode_hex ) {
$unicode = hexdec($unicode_hex);
$utf8 = '';
if ( $unicode < 128 ) {
$utf8 = chr( $unicode );
} elseif ( $unicode < 2048 ) {
$utf8 .= chr( 192 + ( ( $unicode - ( $unicode % 64 ) ) / 64 ) );
$utf8 .= chr( 128 + ( $unicode % 64 ) );
} else {
$utf8 .= chr( 224 + ( ( $unicode - ( $unicode % 4096 ) ) / 4096 ) );
$utf8 .= chr( 128 + ( ( ( $unicode % 4096 ) - ( $unicode % 64 ) ) /
64 ) );
$utf8 .= chr( 128 + ( $unicode % 64 ) );
} // if
return $utf8;
} // unicode_to_utf8
header('Content-Type: text/plain; charset=utf8');
$ch4 = '0034'; // digit '4'
$chA = '0041'; // char 'A'
$utf4 = unicode_to_utf8($ch4);
$utfA = unicode_to_utf8($chA);
print $utf4 ."\n" . $utfA;
?>
See
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
- unicode mapping table
http://www.randomchaos.com/document.php?source=php_and_unicode
- where the unicode_to_utf8 method is inspired from...
regards,
asgeir
--- End Message ---
--- Begin Message ---
Heh Asgeir, nice to meet you again!
> Michael Wallner wrote:
>
>> Currently I want to utilze IBMs ICU resources [2] which encode unicode
>> characters in the "\u00F4" way.
> Hi Mike,
>
> Don't know if this is exactly what you're after, but the following
> example converts hex unicode, eg "00F4" (strip the "\u") to utf8:
> [...]
Well actually it *is* :)
Thanks a *lot*,
--
Michael - < mike(@)php.net >
signature.asc
Description: OpenPGP digital signature
--- End Message ---
--- Begin Message ---
Hi Asgeir Frimannsson, you wrote:
> Hi Mike,
>
> Don't know if this is exactly what you're after, but the following
> example converts hex unicode, eg "00F4" (strip the "\u") to utf8:
>
> <?php
[...]
> $ch4 = '0034'; // digit '4'
> $chA = '0041'; // char 'A'
> $utf4 = unicode_to_utf8($ch4);
> $utfA = unicode_to_utf8($chA);
>
> print $utf4 ."\n" . $utfA;
> ?>
Hi Asgeir,
it seems that the function doesn't work with higher
values like "\u20AC" which is the EUR symbol :-(
Any ideas?
Many thanks,
--
Michael - < mike(@)php.net >
signature.asc
Description: OpenPGP digital signature
--- End Message ---
--- Begin Message ---
Claire Hector wrote:
> David,
> Thanks for your reply!
> I am aware that SJIS shouldn't be used for internal coding, however the
config
> of the server is out of my control and the person who has set it up
determined
> it was the best setup. It's his rice field ;-) .... i pointed out the very
> clear warnings from the php site and he showed me a japanese language site
> that basically justified this setup, so when he wouldn't budge I went and
> changed the config settings in both php and mysql myself to see if this would
> fix the problem thus I had:
> Each page with shift_JIS encoding.
> mbstring.language=Japanese
> mbstring.internal_encoding=EUC-JP
> mbsting.http_output=SJIS
> MySQL table charater set ujis
> and Apache using an additional module for Japanese from webDAV called
> mod_encoding
> with ServerEncoding set to utf-8 & DefaultClientEncoding as JA-AUTO-SJIS-MS
> with this setup I had real mojibake - none of the japanese on the site
> resembled kanji....
> I have not encountered the webDAV module before and am wondering if this
could
> be causing problems. Our network guy said that he needed to use this in order
> to get the settings all working, just configuring php as he had also gave
> mojibake problems!!
> It is quite a frustrating problem as it only effects a few kanji and only
once
> they have been sent to mySQL.
> Any other ideas??
> Cheers,
> Claire
> David Emery wrote:
> > 2004/04/06 ($B2P(B) 11:58 $B$K(B Claire Hector $B$5$s$O=q$-$^$7$?(B:
> > > Hello!
> > > I have a question regarding some particular Japanese words.
> > >
> > > I have a MySQL database and have set the character set for selected
> > > tables to sjis (I have also tried this with ujis and various php
> > > settings)
> > >
> > > php.ini settings for mbstring are as follows:
> > >
> > > mbstring.language=Japanese
> > > mbstring.internal_encoding=SJIS
> >
> > You shouldn't use SJIS for internal encoding or in your DB, or inside
> > your PHP. It's evil and will cause exactly the type of problems you're
> > having. I think the mbstring docs explain this. Setting
> > mbstring.internal_encoding to EUC would be better. UTF is good too, but
> > I don't think it's supported by MySQL.
> >
> > There should be an mbstring.http_output setting as well, which you
> > probably want to set to SJIS.
> >
> > > mbstring.http_input=auto
> > > mbstring.http_input=UTF-8
> > > mbstring.encoding_translation=on
> > > mbstring.detect_order=auto
> > > mbstring.substitute_character=none
> > >
> > > [the server was configured by our network person so i am not 100% sure
> > > of the reasoning behind these particular settings.]
> > >
> > > The server is Apache using an additional module for Japanese from webDAV
> > > called mod_encoding
> > > with ServerEncoding set to utf-8 & DefaultClientEncoding as
> > > JA-AUTO-SJIS-MS
> > >
> > > Each page is encoded as shift_jis. (i have also played around with these
> > > and the get the same problems if this is changes to utf-8 or euc-jp)
> > >
> > > Generally, the html pages display Japanese without problem, however
> > > there are a couple of particular kanji that do not display properly.
> > > When entered into a html form they look fine, when the query the data is
> > > used for is echoed back to the screen all is fine, but once they are
> > > actually entered into the MySQL database they change.
> > >
> > > Examples of problem kanji...
> > > yo-so-ku -> when this is stored in the database it changes to the kanji
> > > for egg and a small katakana i...
> > > yo-so-u -> when this is stored in the database it changes to the kanji
> > > for egg and a small z...
> > > also
> > > ko-u-chi-ku
> > > ju-u-bu-n
> > > hyo-u-to
> > > hyo-u-sho-u
> > >
> > > Can this be changed by altering the encoding scheme chosen or are these
> > > particular problem kanji and should just be avoided?
> > >
> > >
> > > I would really appreciate your expertise in helping me make sense of
> > > this.
> > > Thanks,
> > > Claire
> > --
> > -dave
> >
> > --
> > PHP Internationalization Mailing List (http://www.php.net/)
> > To unsubscribe, visit: http://www.php.net/unsub.php
> --
> ------------------------------------------------
> $B>>2<EE9)3t<02q<R(B
> $BIJ<AIt!!IJ<A4k2h?d?J%0%k!<%W(B
> $B%/%l%"!!%X%/%?!<(B
> Claire Hector
> Matsushita Electric Works Ltd.
> Quality Management System Group
> Corporate Quality Management
> E-mail: [EMAIL PROTECTED]
> Notes: [EMAIL PROTECTED]
> MIC:7-711-2470 [EMAIL PROTECTED](Bphone$B!K!'(B+81-6-6908-6803
> FAX:7-711-2479 [EMAIL PROTECTED](BFAX$B!K(B $B!'(B+81-6-6906-2202
> ------------------------------------------------
--- End Message ---