On 09/07/2004 00:01, Kenneth Whistler wrote:
Peter Kirk said:
I made a serious point, not apparently made in the UTR draft, that diacritic folding may be useful for spam filtering and similar applications including finding misleading URIs.
This seems like a reasonable point to make and to add to the discussion of folding in UTR #30.
António suggested a serious point that for more comprehensive spam filtering an enhanced folding might be useful, including such foldings as | > I (capital i) and l (small L), 0 (zero) > O, |\/| > M. Would such foldings in fact be feasible and useful?
Well, someone could try, I suppose, but this stuff tails out pretty rapidly into mind-boggling complexity, ...
Indeed. I wouldn't suggest going beyond the clearly shape-based. But it is hard to know where to draw the line, which is another reason to add to /|/|ike's good ones for not trying to standardise this. But this kind of approach based on UTR #30 may still be helpful for spam filtering developers.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

