php-i18n Digest 4 Dec 2002 18:38:41 -0000 Issue 136

Topics (messages 364 through 368):

Re: UTF-8 or multiple charsets on a page
        364 by: Steve Vernon
        365 by: a.h.s. boy

Re: Multiple Languages
        366 by: George Polevoy

Re: php utf8 encode
        367 by: George Polevoy

htmlspecialchars and UTF-8
        368 by: Renato De Giovanni

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [EMAIL PROTECTED]


----------------------------------------------------------------------
--- Begin Message ---
Hiya,
    I'm new to this internaiton thing and working on my first international
website.

    I don't actually use gettext, because my server dosent have it
installed, but I use my own which basically reads the page text from an
Unicode XML file. I initially wrote the file in a non Unicode text editor so
it ended up not working. So you need to have a unicode text editor (I use
Yudit, just search for it, there is a Unix and Windows versions). Make sure
you set Yudit to use Unicode. To populate this file, I write the tags in
English, and the other languages I copy using text from freetranslation.com,
that site is not in Unicode but it seems to work, copying must do some
conversion.

    My header includes <meta http-equiv='Content-Type' content='text/html;
charset=utf-8'>. And includes lines such as this <meta
http-equiv='Content-Language' content='en-uk'>, for each language (not sure
if needed though.

    And it works great!

    Iv'e not tried MySQL yet with Unicode, I think you need the latest
version, but if your database is not in Unicode it may need altering.

    Love,

    Steve
    XX



> In the process of designing a web-based publishing system that is
> internationalized using gettext() calls, I've run into some odd
> problems with the display and proper charset when there are multiple
> languages on a page. Here's the background:
>
> My default charset (declared in content-type meta tag) is iso-8859-1.
> On the page, there is a language popup that allows users to change to
> any number of languages, including Greek (iso-8859-7) and Turkish
> (iso-8859-9). Switching to a different language changes PHP's locale
> settings, and also changes the META tag charset. You're welcome to poke
> and prod it at http://dev.dadaimc.org/.
>
> Users input text into a form for publication. The form declares
> "accept-charset=" and defaults to iso-8859-1, windows-1252, utf-8, and
> then whatever charsets are used by available languages (e.g. iso-8859-7
> and -9). The text is stored in a MySQL database (default charset
> iso-8859-1).
>
> I can successfully input English, Turkish, and Greek text. And when a
> viewer selects "Turkish" from the language menu, the Turkish text
> displays fine (because META tag is set to "iso-8859-9"). The same
> applies to English and Greek. However, on a page which displays all
> three texts -- one in each language -- only one of the three will
> display properly (whichever one corresponds to the currently selected
> language). Obviously inconvenient.
>
> I visited another site -- http://www.indymedia.org/ which also displays
> multiple languages on the same page. It uses "utf-8" in the META tag
> (which makes sense, since it encompasses all the necessary characters).
> The publishing form it uses declares no accept-charset parameter. But
> it works!
>
> When I tried using "charset=utf-8" in my META tag, the text displays
> worse than before -- lots of unprintable characters.
>
> So I'm wondering if anyone knows the magic incantation that brings this
> all together -- how do I get my 3 texts in English, Greek, and Turkish
> to ALL display properly on the page at the same time?
>
> It can't be a simple META tag set to utf-8...that didn't work.
> Is the problem in the input method? The database storage method? The
> display method?
> Do I need to accept-charset=utf-8 ONLY on my input form?
> Does the page charset on the page containing the input form need to be
> utf-8?
> Does the database need to default to using utf-8 for storage? (It
> doesn't seem to be supported).
>
> Please help!
>
> Cheers,
> spud.
>
> -------------------------------------------------------------------
> a.h.s. boy
> spud(at)nothingness.org            "as yes is to if,love is to yes"
> http://www.nothingness.org/
> -------------------------------------------------------------------
>
>
> --
> PHP Internationalization Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>

--- End Message ---
--- Begin Message --- So the magic formula appears to be this:

1) Page containing the input form is set to UTF-8
2) Form itself is set to accept-charset=utf-8, or does not specify
3) Display page is set to UTF-8

So if you input text 3 times, in 3 languages, and the input page is set for UTF-8, then the display page will display all of them properly on a single page. Any text input _prior_ to the input page being in UTF-8 will still not display correctly...it would appear that the text would have to be either converted before display (some mb_xyz() function?), or would have to be re-entered on a UTF-8 page. Otherwise not backwards compatible.

I don't have the Multi-byte extension compiled in, and have no experience with it, but can any of its functions determine the encoding type of input text? I'm wondering what will happen when I need to support Japanese or something that doesn't fall within UTF-8...what then?

Cheers,
spud.

On Monday, December 2, 2002, at 10:43 AM, Steve Vernon wrote:

My header includes <meta http-equiv='Content-Type' content='text/html;
charset=utf-8'>. And includes lines such as this <meta
http-equiv='Content-Language' content='en-uk'>, for each language (not sure
if needed though.


-------------------------------------------------------------------
a.h.s. boy
spud(at)nothingness.org            "as yes is to if,love is to yes"
http://www.nothingness.org/
-------------------------------------------------------------------

--- End Message ---
--- Begin Message ---
Hi!
Here is my solution to globalization. It works good, if you only need a few
languages
It uses simple markup for html and php strings.

Hope this helps.

usages:

require_once( 'localize.inc"); // see belo

1.    loc( "en|In English|ru|In Russian|fr|Franci... i don't speak this
language actually :)", "ru" /*this argument can be skipped*/);

    Use this notation, if you work with internal strings, such as SQL or PHP
strings.
    It's quite simple even for non-programmers (editors and translators) to
use this markup.

2.    localize( "<p>Following text is in your language: <glob>en|In
English|ru|In Russian</glob></p>, "en" );

    Use this notation for HTML

3.    print(localize(implode( "", file("myfile_globalized.html")),
$_REQUEST['locale']));

       you can now call this file:
       http://localhost/site/myfile.php?locale=en
       do not send a file name in arguments, because it violates sequrity

example of globalized file:

------------------------------------
<html>
 <head>
  <title><glob>en|My Document - English|ru|My Document -
Russian</glob></title>
 </head>
 <body>

 <a href="<glob>en|english.htm|ru|russian.htm</glob>"><glob>en|In
English|ru|In Russian</glob></a>

 </body>
</html>
------------------------------------

These <glob> tags are not well formed xml tags!
Current limitation is that <glob></glob>  tag cannot span multiple lines

I'm using "UTF-8  without signature" encoding for all my html and php files

-------------------------------------------------------        localize.inc

<?php

$default_locale = "en";

function defLocale() { global $default_locale; return $default_locale; }

// add other locales, as you need

$supported_loc = explode(" ", "ru en" );
function search_loc($ag_str)
{
 global $supported_loc;
 while( list($k, $al) = each( $supported_loc ) ) { if ( preg_match( "/". $al
."/", $ag_str ) ) return $k; } return null;
}

function setup_user_locale()
{
 session_start();
 global $default_locale;
 global $supported_loc;

 session_start();
 // session_destroy(); // for debug purposes

 $key = null;

 global $lang_changed;

 if ( isset($_SESSION['lang']) && isset($_REQUEST['lang']) && (
$_SESSION['lang'] != $_REQUEST['lang'] ) )
  $lang_changed = TRUE;
 else
  $lang_changed = FALSE;

 if ( isset( $_REQUEST['lang'] ) )
 { if ( $_REQUEST['lang']!= '')
  {
  $key = search_loc( $_REQUEST['lang'] );
  $_SESSION['lang']=$_REQUEST['lang'];
  }
 }

 if ( isset( $_SESSION['lang'] ) & ! isset( $key ) ) { if (
$_SESSION['lang']!= '' )
 { $key = search_loc( $_SESSION['lang'] ); } }

 if ( ! isset($key) )
 {
  $key = search_loc( $_SERVER['HTTP_ACCEPT_LANGUAGE'] );
 }

 if ( isset($key) ) $default_locale = $supported_loc[ $key ];
 else $default_locale = "en";
}

function loc_try( $in, $required_locale = null )
{
 global $default_locale; if ( ! $required_locale ) { $required_locale =
$default_locale; }
 $out = "";
 // $myPattern = '/('. $required_locale .'\|)([^(\|<)]+)/';
 $myPattern = '/('. $required_locale .'\|)([^(\|)]+)/';
 preg_match( $myPattern, $in, $matches );
 return (count($matches) == 0) ? null : $matches[2];
}

$all_language_dog = "en|text is not translated to your language|ru|????? ??
????????? ?? ??? ????";

// USAGE: loc( "en|In English|ru|In Russian" );
//

function loc( $in, $required_locale = null )
{
 global $all_language_dog;
 $good = loc_try($in,$required_locale);
 if ( ! $good )
 {
  $last_chance = loc_try( $in, "en" );
  return ($last_chance) ? $last_chance : loc( $all_language_dog,
$required_locale );
 }
 else
 {
  return $good;
 }
}

// syntax: localize( "<glob>en|In English|ru|In Russian</glob>" );
// this function is useful for entire html/xml files
// contents of myfile.php
//
// require( 'localize.inc' );
// print(localize(implode( "", file("myfile_globalized.html")),
$_REQUEST['locale']));//
//
// USAGE: http://www.qqq.org/myfile_localized.php?&locale=ru

function localize( $str, $required_locale = null )
{
 global $default_locale; if ( ! $required_locale ) { $required_locale =
$default_locale; }
 return preg_replace( "/(<glob>)(.+)(<\/glob>)/eU",
"loc('\\2','$required_locale')", $str );
}

setup_user_locale();

?>

------------------------------------------------------------ end of
localize.inc




------------------------------------------ localize_test.php


<?php

// ?? ??????

require( "localize.inc" );

function mix_query( $new_disasm )
{
 $old_disasm = array();

 parse_str( $_SERVER['QUERY_STRING'], $old_disasm );

 $super_query = array_merge( $old_disasm, $new_disasm );

 $out = array();
 foreach( $super_query as $key => $val )
 {
  if ( $val != "" )
  array_push( $out, $key . '=' . $val );
 }

 $out = implode( "&", $out );
 return $out;
}

function mix_path( $new_disasm ) { return path() . '?' . mix_query(
$new_disasm ); }

function path()
{
 $p = parse_url( $_SERVER['PATH_INFO'] );
 return $p['path'];
}

function switch_language()
{
 global $supported_loc;
 $language_name = array( "en" => "English", "ru" => "Russian" );

?>
<table border="0" cellpadding=0 cellspacing=0>
 <tr>
  <td bgcolor="white">
   language
  </td>
<?php
 foreach ( $supported_loc as $loc )
 {
?>
  <td bgcolor="<?php

  print( ($loc == defLocale() )? "c0c0c0" : "d0d0d0" );

  ?>">

  <?php

  print (
  ( $loc == defLocale() )
  ? ":[".$language_name[$loc]."]:"
  : "<a href=\"". mix_path( array( 'lang' => $loc ) ) ."\">". "::"
.$language_name[$loc]."::"."</a>" );
  ?>
  </td>
<?php
 }
?>

 </tr>
</table>
<?php
}

?>

<html>
<head><title><?php
print ( loc( "en|English|ru|Russian" ) );
?></title>
</head>
<body>

<pre>
<?php

switch_language();

?>

</pre>
<?php

print ( localize( "This is in English.<font color=\"green\"><glob>en|And
this is in <b>your</b> requested language|ru|Russian Bla bla
bla</glob></font>. Yeah" ) );

?>

</body>
</html>



------------------------------------------- end localize_test.php

"Steve Vernon" <[EMAIL PROTECTED]> wrote in message
000001c29558$78490ff0$2cfd87d9@extreme">news:000001c29558$78490ff0$2cfd87d9@extreme...
> Hmmmm,
>     This seems to be harder than I thought!
>
>     Thanks everyone for there help!
>
>     Carl this IBM international thing is very interesting, and many thanks
> for telling me about it. Just think it is a lot of work, especially since
I
> want my site to be on in January, not sure I can manage this with my other
> tasks. I will try and research it and maybe use it in the future.
>
>     Im probably going to use the Get Text, that Moriyoshi mentioned.
Thanks!
>
>     Basically Im an ameture at this international stuff, Iv'e never worked
> on an international project and no idea where to start. I could really do
> with some advice.
>
>     I am hosting my site in Germany, and need to know what to tell the
> server admin people. What should I tell them about MySQL and Apache and
PHP
> set up please (on linux).
>
>     And for my website, I assumed that you just alter the doctype say from
> <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN"> (im using 3.2 but had
> this to hand!) to the equivalent French one. Is this wrong? Do I need to
> specfy character sets and encoding??
>
>     Many Thanks,
>
>     Steve
>
> > Steve,
> >
> > A couple of years ago I developed extensions for PHP that sound like
what
> > you want.  It used ICU http://oss.software.ibm.com/icu/ to provide the
> i18n
> > services for PHP.
> >
> > First it also provided an override to the Apache mod_mime services to
> allow
> > it to work like other web servers and allow you to put different
language
> > web pages in different subdirectories rather than require a language
type
> on
> > each page.  You can have it override just certain types such as .php
pages
> > only.
> >
> > This way you can have:
> >
> >  ../site/en/contents/page.php
> >  ../site/fr/contents/page.php
> >
> > or full locales
> >
> >  ../site/en-uk/contents/page.php
> >  ../site/en-us/contents/page.php
> >  ../site/en-ca/contents/page.php
> >  ../site/fr-fr/contents/page.php
> >  ../site/fr-ca/contents/page.php
> >
> > It sets the locale on a per transaction bases.  It will not only work
with
> > Unicode with ICU but it will also work in code pages.
> >
> > You can have a terminal using Shift-JIS and Japanese pages in EUC and
> access
> > a UTF-8 database.  The same code will work with any code page or
Unicode.
> >
> > You can see code that I derived from the ICU interface portion of the
code
> > at http://www.xnetinc.com/xiua/
> >
> > The PHP changes also included a way to support charset more dynamically.
> >
> > Carl
> >
> >
> >
> >
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: Steve Vernon [mailto:[EMAIL PROTECTED]]
> > > Sent: Sunday, November 24, 2002 7:20 AM
> > > To: [EMAIL PROTECTED]
> > > Subject: [PHP-I18N] Multiple Languages
> > >
> > >
> > > Hiya,
> > >     I'm working on my first international PHP project, which is a
> > > site that
> > > displays differently in different languages. So I guess this is
> > > the place to
> > > ask adive.
> > >
> > >     Ive done the code with flags etc so people can select the
> > > country and so
> > > the language. But im not sure about the best way to handle the text
and
> > > things such as meta tages.
> > >
> > >     My origional idea was to have multiple include files named
> > > like uk.php,
> > > french.php with seperate variables for the text parts, or do sommat in
> XML
> > > which is basically the same.
> > >
> > >     I cant find any sources on the net about best ways to do a site
like
> > > this.
> > >
> > >     Help!
> > >
> > >     Thanks,
> > >
> > >     Steve
> > >
> > >
> > > --
> > > PHP Internationalization Mailing List (http://www.php.net/)
> > > To unsubscribe, visit: http://www.php.net/unsub.php
> > >
> > >
> >
> >
> >
> > --
> > PHP Internationalization Mailing List (http://www.php.net/)
> > To unsubscribe, visit: http://www.php.net/unsub.php
> >
>


--- End Message ---
--- Begin Message ---
PHP does output UTF8 encoding

header( 'Content-type: text/html; charset=utf-8' );

You need to send headers before any text/html output started., so the there
should not be any carriage returns before the first  "<?php"
Also you need to save your PHP file as UTF-8 encoded, if you include unicode
characters in the php itself.
You can use notepad Save As.... UPF-8, but it's not good, because it sends
out the signature, and it happens before you can send any headers, sessions,
cookies etc....
Microsoft Developer Studio does right job. - Advanced Save Options -
utf8-without signature
You have to keep some unicode characters in your file to maintain the
setting from save to save, because if you just have english characters, it
will be the same as ANSI

"Javi Lavandeira" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> Hi,
>
> >      Yes, i sent this header information but it's just informative for
the
> > browser and i can manually select the encoding to UTF-8 from the
browser,
> > but that's not the case, because the php output simply isn't utr-8 :)
> > and all i see is shits :).
>
> Could you give me the URL of some example of this problem so I can take a
look at it?
>
> Regards,
>
> --
> Javi Lavandeira ([EMAIL PROTECTED]) - http://www.ag0ny.com


--- End Message ---
--- Begin Message ---
Hi,

We're working on a program that needs to generate an xml document, utf-8 
encoded, based on information stored with another charset encoding inside a 
database.

So basically at some point we have a function that looks like:

function encodeString($s, $encoding)
{
  $s = mb_convert_encoding($s, 'UTF-8', $encoding);

  $s = htmlspecialchars($s, ENT_COMPAT, 'UTF-8');

  return $s;
}

What happens is that "htmlspecialchars" seems to be always returning a 
latin1 string, therefore causing problems with the generated document 
supposedly utf-8 encoded.

My question is: is this a php bug or am I misunderstanding something related 
to multi-byte character support?

PHP version is 4.2.3

Thank you very much!
--
Renato
CRIA - Centro de Referencia em Informacao Ambiental
http://www.cria.org.br/


--
This message has been scanned for viruses and
dangerous content and is believed to be clean.

--- End Message ---

Reply via email to