Tony Laszlo <[EMAIL PROTECTED]> wrote:

> On Sat, 1 Feb 2003, Moriyoshi Koizumi wrote:
> 
> > > (as you can see from the top page here: 
> > > http://www.issho.org/ , pretty much every language - and 
> > > encoding - out there, needs to be supported). 
> > > 
> > > a way to wildcard it would be preferred. :)
> > > Is there not such a way?
> > 
> > Since encoding detection is essentially heuristic and thus its 
> > accuracy depends on the number of likely candidates given in the first 
> > place, there is no way to guess a right encoding out of any possible 
> > encodings from limited length of strings.
> 
> Thank you. 
> 
> Some people are apparently using the Encode::HanDetect perl module to 
> guess the encoding the feed based on statistical heuristics. 
> 
> Too imperfect to try, or something that can be done with perl 
> but not with php? 

Encode::Guess, the module used in Encode::HanDetect for its core 
functionality, works nearly in the same way as mbstring's encoding 
detection, namely both implementations are based on statistical properties 
of each encoding or each language. Some other implementations may make use 
of the frequency of appearance of important language elements such as 
postpositions and prepositions.

Anyway all I can say here is "do not too rely on them", as stated in the 
documentation of Encode::Guess by Dan Kogai,

-----------------------------------------------------------------------
DO NOT PUT TOO MANY SUSPECTS!  Don't you try something like this!

my $decoder = guess_encoding($data, Encode->encodings(":all"));

It is, after all, just a guess.  You should alway be explicit when it
comes to encodings.  But there are some, especially Japanese,
environment that guess-coding is a must.  Use this module with care. 
------------------------------------------------------------------[EOQ]

Moriyoshi


-- 
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to