On 6/26/2010 5:41 PM, Doug Ewell wrote:


Regarding the inability to distinguish 8859-15 heuristically from 8859-1, I understand the problem when there are no tags or other hints, or for cases like Windows-1252 text declared to be 8859-1, but it seems unlikely to me that there is much text encoded in 8859-1 (or Windows-1252) that is tagged as 8859-15. I would think in a case like that, it might make sense to trust the tag. I suspect the problem of unreliable declarations is greater for most other tuples of (declared-encoding, actual-encoding).
Doug,

this is an interesting concept, i.e. that the reliability of the tag being correct might well depend on the value of the tag. I wonder whether that type of probability is being considered at all when making the decision to trust auto-recognition over tag value.

A./


Reply via email to