php-i18n Digest 10 Feb 2003 11:10:20 -0000 Issue 149
Topics (messages 441 through 444):
Re: mb_detect_encoding mb_convert_encoding and rss
441 by: Moriyoshi Koizumi
determining language input of forms
442 by: Simon De Deyne
443 by: Moriyoshi Koizumi
PDF generation with Japanese text
444 by: Simon De Deyne
Administrivia:
To subscribe to the digest, e-mail:
[EMAIL PROTECTED]
To unsubscribe from the digest, e-mail:
[EMAIL PROTECTED]
To post to the list, e-mail:
[EMAIL PROTECTED]
----------------------------------------------------------------------
--- Begin Message ---
Tony Laszlo <[EMAIL PROTECTED]> wrote:
> On Sat, 1 Feb 2003, Moriyoshi Koizumi wrote:
>
> > > (as you can see from the top page here:
> > > http://www.issho.org/ , pretty much every language - and
> > > encoding - out there, needs to be supported).
> > >
> > > a way to wildcard it would be preferred. :)
> > > Is there not such a way?
> >
> > Since encoding detection is essentially heuristic and thus its
> > accuracy depends on the number of likely candidates given in the first
> > place, there is no way to guess a right encoding out of any possible
> > encodings from limited length of strings.
>
> Thank you.
>
> Some people are apparently using the Encode::HanDetect perl module to
> guess the encoding the feed based on statistical heuristics.
>
> Too imperfect to try, or something that can be done with perl
> but not with php?
Encode::Guess, the module used in Encode::HanDetect for its core
functionality, works nearly in the same way as mbstring's encoding
detection, namely both implementations are based on statistical properties
of each encoding or each language. Some other implementations may make use
of the frequency of appearance of important language elements such as
postpositions and prepositions.
Anyway all I can say here is "do not too rely on them", as stated in the
documentation of Encode::Guess by Dan Kogai,
-----------------------------------------------------------------------
DO NOT PUT TOO MANY SUSPECTS! Don't you try something like this!
my $decoder = guess_encoding($data, Encode->encodings(":all"));
It is, after all, just a guess. You should alway be explicit when it
comes to encodings. But there are some, especially Japanese,
environment that guess-coding is a must. Use this module with care.
------------------------------------------------------------------[EOQ]
Moriyoshi
--- End Message ---
--- Begin Message ---
Hi,
I'm not sure if this is as PHP as it should be but I have a question about
forms and language. I use a page with a form and depending on the situation the
user needs to input hiragana or sometimes just romaji. How can i set the form
so that the user doesn't has to hassle with clientsite things in the Microsoft
Windows IME. I realise that this is a clientsite problem, but nevertheless, I'm
sure you guys know what i'm talking about,
kind regards,
Simon DD.
--- End Message ---
--- Begin Message ---
Hi,
You can specify how IME works by a CSS attribute "ime-mode"
See
http://msdn.microsoft.com/workshop/author/dhtml/reference/properties/imemode.asp
for more information.
But remember that it's MSIE specific and there's no generic way to control
IME by HTML.
Then how about detecting the type of input characters and rejecting the
malformed user inputs on the server side?
Moriyoshi
Simon De Deyne <[EMAIL PROTECTED]> wrote:
> I'm not sure if this is as PHP as it should be but I have a question about
> forms and language. I use a page with a form and depending on the situation the
> user needs to input hiragana or sometimes just romaji. How can i set the form
> so that the user doesn't has to hassle with clientsite things in the Microsoft
> Windows IME. I realise that this is a clientsite problem, but nevertheless, I'm
> sure you guys know what i'm talking about,
>
> kind regards,
>
> Simon DD.
>
> --
> PHP Internationalization Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
--- End Message ---
--- Begin Message ---
Hi,
Has anyone already succesfully accomplished generating PDF files with
Japanese text (say from UTF-8 encoding)?
If so, does anybody feel inclined to offer some more insight in this
matter by posting an example script?
I've been trying, but my system started to crash in most horrible ways.
Thanks!
Simon
--- End Message ---