Enumerating all canonically equivalent strings

2011-06-20 Thread BobH
Does there exist a standard module or function that, given a Combining Character Sequence (or, more generally, an arbitrary Unicode text string), will generate a list of all canonically equivalent strings? For example, if given the character U+1EAD, I'd like to get back a list of all these can

Need: list of Unicode characters that have canonical decompositions.

2011-06-27 Thread BobH
A project I'm working on needs to build a list of all Unicode characters that have canonical decompositions. The most efficient ways I can think of to get such a list are from unicore/Decomposition.pl or by scanning unicore/UnicodeData.txt. However: Re unicore/Decomposition.pl, the header of t

Re: Need: list of Unicode characters that have canonical decompositions.

2011-06-27 Thread BobH
BobH wrote: Re unicore/UnicodeData.txt, I've recently posted a version of my module that uses unicore/UnicodeData.txt to CPAN, and from Perl 5.14 testers I've received only failure notices which indicate that the file cannot be found :-( Just installed ActivePerl 5.14 and, indeed,

Re: Need: list of Unicode characters that have canonical decompositions.

2011-06-27 Thread BobH
Karl Williamson wrote: > I'm presuming you need this not for a one-time only thing, but to be > able to run this program over and over. Yes -- this is for a module that will be usable in a number of situations. See http://search.cpan.org/~bhallissy/Text-Unicode-Equivalents-0.05/. The curren

Re: Need: list of Unicode characters that have canonical decompositions.

2011-06-29 Thread BobH
Karl Williamson wrote: If I did this, I would be tempted to have it return an inversion list, instead of an array of every code point that matches the property. ... My question to you is would that be acceptable to you, do you think? I hate to return an enormous array by default when the appli

Re: Need: list of Unicode characters that have canonical decompositions.

2011-07-01 Thread BobH
Karl Williamson wrote: I'm trying to think of a good name. Best so far is UCD::get_prop_invlist() Hm, "get" normally isn't needed. How about something simpler such as UCD::charlist() Bob