I'm putting together a new PHP function called str_convert_encoding which
will convert a string from one encoding to another using the features of the
system.  It works like this:

string str_convert_encoding(string srcstring, string fromenc, string toenc)

// No change; return source
if (fromenc == toenc)
    return srcstring;

Promote us-ascii and iso-8859-1 to cp1252
Try to convert using mbstring if present
Try to convert using recode if present
Try to convert using iconv if present
return original string if cannot convert

Promoting the charset is safe because iso-8859-1 is us-ascii with extensions
and cp1252 is iso-8859-1 with extensions.  What I have found is that quite
often ascii is used for charsets when it is really iso-8859-1 or cp1252.
The spirit of this function is to pass back something useful and not be too
strict/pedantic about it.

For the implementation, I've added php_str_convert_encoding that does the
actual work and also returns a success code so that C code knows if the
encoding was changed.

I plan to use this function with the htmlentities mods that I have done
recently so that when it comes across an unknown charset it can convert it
to utf-8, preserve any wide chars that might be present, encode the entities
and then convert back to the original encoding again.

Does anyone have any comments or suggestions about this?

I'm fairly certain that this function would be well received and prevent
loads of people having to do the equivalent in user land.


PHP Development Mailing List <http://www.php.net/>
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to