Tom Kaiser wrote:
> 
> Thanks VERY MUCH for these comments. Well the major drawback of the
> solution based on iconv is the following. We're using expat as the xml
> parser. If expat sees a document whose encoding is not among the 4
> encodings it supports internally, it asks you for a certain table for
> that encoding. The table has an item for each byte between 0x80 and
> 0xff. The item for byte N specifies the length of the character codes in
> which N appears as the leading byte.
> 
> Now the problem is that, sadly, we can't get this information from
> iconv. At least for encodings which use codes of more than 2 bytes. So
> in addition to iconv, we'd still need sort of "definition files" for
> each supported encoding. I think it's clear that this greatly reduces
> the advantage of using iconv. Obviously this is not a drawback of iconv
> itself, but rather of iconv in combination with expat.

You must have thought of and rejected this already, but could you
generate the tables automatically using iconv?  Write a script that
loops over each encoding and uses iconv to translate a file with a line
for each convertable character into the output encoding and then derive
the output encoding length for each character from that converted
output?

 Steve

Reply via email to