Thanks VERY MUCH for these comments. Well the major drawback of the
solution based on iconv is the following. We're using expat as the xml
parser. If expat sees a document whose encoding is not among the 4
encodings it supports internally, it asks you for a certain table for
that encoding. The table has an item for each byte between 0x80 and
0xff. The item for byte N specifies the length of the character codes in
which N appears as the leading byte.
Now the problem is that, sadly, we can't get this information from
iconv. At least for encodings which use codes of more than 2 bytes. So
in addition to iconv, we'd still need sort of "definition files" for
each supported encoding. I think it's clear that this greatly reduces
the advantage of using iconv. Obviously this is not a drawback of iconv
itself, but rather of iconv in combination with expat.
Then there are platforms where iconv needn't be available (portable
machines with an incomplete unix). For these, the other approach would
be ideal: you could even decide which encodings you would use, and
install only the corresponding files.
To reply to Mark's message: yes there's at least one widely used CKJ
encoding, called Chinese BG I believe, which is not covered by the
XML::Encoding module.
I'd like to make it clear that using *encoding files* from a Perl module
doesn't mean tying Sablotron to Perl. It would remain just as
standalone, except that these particular encoding files (1) are
available for use, (2) seem to represent a certain standard, and (3) can
be extended to include new encodings quite easily (it seems).
I'll appreciate any further comments. I hope it's understood that I'm
still trying to evaluate the possible ways rather than advocating any of
them. If I'm missing an elegant solution using iconv, I'll be more than
happy to learn about it.
Thanks again,
Tom
Mark Bartel wrote:
> I second this sentiment, that going away from iconv would seem a very large
> step back, and I'd also like to know more about the problems of which you
> speak.
>
> To answer the original request, yes, we would miss other encodings. I'm sure
> there's at least one asian encoding that we use absent from that list that
> I've encountered personally, and I'm certain we use other encodings as well.
>
> -Mark Bartel
>
> -----Original Message-----
> From: Kestutis Kupciunas <[EMAIL PROTECTED]>
> Sent: Wed, 31 Jan 2001 10:34:54 +0200
> To: Sablotron Mailing List <[EMAIL PROTECTED]>
> Subject: Re: [Sab] encodings - opinions?
>
> On Tue, Jan 30, 2001 at 10:54:39PM +0100, Tom Kaiser wrote:
> [snip]
>
>> Would anyone greatly miss any encoding which does NOT appear in the list
>> below? (This is the list of encodings covered by XML::Encoding).
>> Big 5, ISO-8859-2 to ISO-8859-9, x-euc-jp, x-euc-kr, x-sjis
>> (Shift_JIS), windows-1250
>> (plus the built-in ISO-8859-1, US_ASCII, UTF-8 and UTF-16)
>>
> Don't take me wrong, but that would be a large step back for sablotron.
> Limiting to such a small subset of encodings, knowing possibilities of iconv
> would make some sablotron users cry. At least at our place we use sablotron
> just because it uses iconv for encoding conversions - we never know what
> encoding we will need tomorrow (but most probably the one, supported
> by iconv). I know, i know, UTF-8 and UTF-16 are encodings of the future and
> someday we will not need any other encodings. But. we live at present time.
>
> Encodings are the weakest part of almost all software on the world. Mainly
> because that software either does not care about any other encodings, or it
> implements "partial" solutions, which do not satisfy everybody. Having a
> general encoding conversion library, which would be used by all software would
> really ease up lives for both developers and users of software. And i vote
> with both my hands for iconv as the library of choice.
>
> And, could you tell more about "serious drawbacks" of iconv you have
> encountered?
>
> regards,
>
> --
> Kestutis Kupciunas (a.k.a ydum)
>
>
>
>