I made a serious point, not apparently made in the UTR draft, that diacritic folding may be useful for spam filtering and similar applications including finding misleading URIs. AntÃnio suggested a serious point that for more comprehensive spam filtering an enhanced folding might be useful, including such foldings as | > I (capital i) and l (small L), 0 (zero) > O, |\/| > M. Would such foldings in fact be feasible and useful? They would have to be part of a general similar shapes folding. And such a folding would also need to deal with such foldings as Cyrillic A and Greek capital alpha > A, as with the whole of Unicode available spammers could very easily write ÐÐÐÐ (Cyrillic) or SÎÎÎ (mostly Greek) instead of SPAM, in an attempt to defeat spam filtering.
Could something like this be defined within the framework of UTR #30? Should it be defined within the UTR? I suspect it would be better left to the discretion of individual developers, who could then rapidly tailor their foldings to any new lookalikes exploited by spammers.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

