On 6/26/2010 5:41 PM, Doug Ewell wrote:
Regarding the inability to distinguish 8859-15 heuristically from
8859-1, I understand the problem when there are no tags or other
hints, or for cases like Windows-1252 text declared to be 8859-1, but
it seems unlikely to me that there is much text encoded in 8859-1 (or
Windows-1252) that is tagged as 8859-15. I would think in a case like
that, it might make sense to trust the tag. I suspect the problem of
unreliable declarations is greater for most other tuples of
(declared-encoding, actual-encoding).
Doug,
this is an interesting concept, i.e. that the reliability of the tag
being correct might well depend on the value of the tag. I wonder
whether that type of probability is being considered at all when making
the decision to trust auto-recognition over tag value.
A./