On 09/07/2004 00:01, Kenneth Whistler wrote:

Peter Kirk said:



I made a serious point, not apparently made in the UTR draft, that diacritic folding may be useful for spam filtering and similar applications including finding misleading URIs.


This seems like a reasonable point to make and to add to the discussion of folding in UTR #30.



António suggested a serious point that for more comprehensive spam filtering an enhanced folding might be useful, including such foldings as | > I (capital i) and l (small L), 0 (zero) > O, |\/| > M. Would such foldings in fact be feasible and useful?


Well, someone could try, I suppose, but this stuff tails out pretty rapidly into mind-boggling complexity, ...


Indeed. I wouldn't suggest going beyond the clearly shape-based. But it is hard to know where to draw the line, which is another reason to add to /|/|ike's good ones for not trying to standardise this. But this kind of approach based on UTR #30 may still be helpful for spam filtering developers.


--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Reply via email to