On May 18, 2009, at 21:54 , Larry Wall wrote:
On Mon, May 18, 2009 at 07:59:31PM -0500, John M. Dlugosz wrote:No, a few million code points in the Unicode standard can produce an arbitrary number of unique grapheme clusters, since you can apply as many modifiers as you like to each different base character. If you allow multiples, the total is unbounded.A small program, which ought to go into the test suite <g>, can generate4G distinct grapheme clusters, one at a time.That precise behavior is what I was characterizing as a DoS attack. :) So in my head it falls into the Doctor-it-hurts-when-I-do-this category.
If you're working with externally generated Unicode, you may not have that option. I've gotten some bizarre combinations out of Word in Hebrew with nikudot, then saved as UTF8 text (so bizarre, in fact, that in the end I used gedit on FreeBSD).
-- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH
Description: This is a digitally signed message part