On 08/07/2004 23:22, Doug Ewell wrote:
Thank you for pointing me to this section. This is a useful discussion which shows clearly why spoofing cannot be avoided by identical encoding of confusables. (And I am glad to see some clearer terminology than I had been using.) But it doesn't address my point that UTR #30 folding can be useful in this area, in providing a framework for what might be called "confusable folding".Peter Kirk <peterkirk at qaya dot org> wrote:
AntÃnio suggested a serious point that for more comprehensive spam
filtering an enhanced folding might be useful, including such foldings
as | > I (capital i) and l (small L), 0 (zero) > O, |\/| > M. Would
such foldings in fact be feasible and useful? They would have to be
part of a general similar shapes folding.
They might be useful for certain applications, in specific situations, but Unicode should not ever try to get entangled in this business of mapping unrelated characters on the basis of glyph similarity alone. It's just too font-dependent and subjective.
See the sub-heading "Spoofing" in TUS 4.0, Section 5.19 "Unicode Security," pp. 141-142 for more information.
But I think I agree with you that Unicode should not get into detailed listing of confusables, because it is too font-dependent and subjective. This kind of thing is best left as a user definable folding.
Actually I am unclear from UTR #30 whether this is supposed to be a framework for user definable foldings or should be restricted to the defined list of foldings; the existence of "Foldings based on tailored collation data" suggest that foldings can at least be tailored, but there are no further details of how such foldings are covered by the UTR.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

