Mark Davis wrote: > What are also tricky are the 'almost' supersets, where there are only a few > different characters. Those definitely cause problems because the difference > in data is almost undetectable.
For example, Mark is referring to cases such as ISO 8859-1 and 8859-15. Those share all the same encoded characters except those at the code points 0xA4, 0xA6, 0xA8, 0xB4, 0xB8, and 0xBC..0xBE. So neither of the repertoires is a proper subset of the other, but the two coded character sets share the vast majority of their characters, including almost all of the common ones. --Ken

