Yves, we are thinking about a general API for encoding detection that could initially 
just check for BOM/Unicode signatures. I believe we have a feature request for this 
already. Mark and I just brainstormed about what we may want an API look like.

The reason for doing what ICU is doing currently is simple pragmatism. None of our 
converters auto-detects anything, and they write only what you tell them to write.
When you deal with serialized data structures and fields in files or databases, that 
is exactly what you want.
With signature-carrying files and transmission protocols, there is more work necessary.

It seems to me that a converter API with its ability to take one byte at a time, and 
no other way to pass additional information ("I know the language of the text..."), is 
not the best way to implement this.

On output, writing a BOM/signature is easy: if you know you need one, then just call 
the converter once with U+feff.
Although, with this one feature, I could imagine having an API "emit a Unicode 
signature if you are a converter for a Unicode encoding".

markus

Reply via email to