On May 18, 2009, at 21:54 , Larry Wall wrote:
On Mon, May 18, 2009 at 07:59:31PM -0500, John M. Dlugosz wrote:
No, a few million code points in the Unicode standard can produce an
arbitrary number of unique grapheme clusters, since you can apply as
many modifiers as you like to each different base character.  If you
allow multiples, the total is unbounded.

A small program, which ought to go into the test suite <g>, can generate
4G distinct grapheme clusters, one at a time.

That precise behavior is what I was characterizing as a DoS attack. :) So in my head it falls into the Doctor-it-hurts-when-I-do-this category.

If you're working with externally generated Unicode, you may not have that option. I've gotten some bizarre combinations out of Word in Hebrew with nikudot, then saved as UTF8 text (so bizarre, in fact, that in the end I used gedit on FreeBSD).

brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com
system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon university    KF8NH

Attachment: PGP.sig
Description: This is a digitally signed message part

Reply via email to