Re: Need: list of Unicode characters that have canonical decompositions.

2011-07-06 Thread Karl Williamson
On 07/01/2011 11:49 AM, Karl Williamson wrote: On 07/01/2011 10:40 AM, BobH wrote: Karl Williamson wrote: I'm trying to think of a good name. Best so far is UCD::get_prop_invlist() Hm, "get" normally isn't needed. How about something simpler such as UCD::charlist() Bob I think not ha

Re: Need: list of Unicode characters that have canonical decompositions.

2011-07-01 Thread Karl Williamson
On 07/01/2011 10:40 AM, BobH wrote: Karl Williamson wrote: I'm trying to think of a good name. Best so far is UCD::get_prop_invlist() Hm, "get" normally isn't needed. How about something simpler such as UCD::charlist() Bob I think not having prop in the name is potentially misleading,

Re: Need: list of Unicode characters that have canonical decompositions.

2011-07-01 Thread BobH
Karl Williamson wrote: I'm trying to think of a good name. Best so far is UCD::get_prop_invlist() Hm, "get" normally isn't needed. How about something simpler such as UCD::charlist() Bob

Re: Need: list of Unicode characters that have canonical decompositions.

2011-07-01 Thread Karl Williamson
On 06/29/2011 09:06 AM, BobH wrote: Karl Williamson wrote: If I did this, I would be tempted to have it return an inversion list, instead of an array of every code point that matches the property. ... My question to you is would that be acceptable to you, do you think? I hate to return an enor

Re: Need: list of Unicode characters that have canonical decompositions.

2011-06-29 Thread BobH
Karl Williamson wrote: If I did this, I would be tempted to have it return an inversion list, instead of an array of every code point that matches the property. ... My question to you is would that be acceptable to you, do you think? I hate to return an enormous array by default when the appli

Re: Need: list of Unicode characters that have canonical decompositions.

2011-06-28 Thread Karl Williamson
On 06/27/2011 08:04 PM, BobH wrote: Karl Williamson wrote: > I'm presuming you need this not for a one-time only thing, but to be > able to run this program over and over. Yes -- this is for a module that will be usable in a number of situations. See http://search.cpan.org/~bhallissy/Text-Uni

Re: Need: list of Unicode characters that have canonical decompositions.

2011-06-27 Thread BobH
Karl Williamson wrote: > I'm presuming you need this not for a one-time only thing, but to be > able to run this program over and over. Yes -- this is for a module that will be usable in a number of situations. See http://search.cpan.org/~bhallissy/Text-Unicode-Equivalents-0.05/. The curren

Re: Need: list of Unicode characters that have canonical decompositions.

2011-06-27 Thread Karl Williamson
On 06/27/2011 08:26 AM, BobH wrote: A project I'm working on needs to build a list of all Unicode characters that have canonical decompositions. The most efficient ways I can think of to get such a list are from unicore/Decomposition.pl or by scanning unicore/UnicodeData.txt. However

Re: Need: list of Unicode characters that have canonical decompositions.

2011-06-27 Thread BobH
BobH wrote: Re unicore/UnicodeData.txt, I've recently posted a version of my module that uses unicore/UnicodeData.txt to CPAN, and from Perl 5.14 testers I've received only failure notices which indicate that the file cannot be found :-( Just installed ActivePerl 5.14 and, indeed, this file n

Need: list of Unicode characters that have canonical decompositions.

2011-06-27 Thread BobH
A project I'm working on needs to build a list of all Unicode characters that have canonical decompositions. The most efficient ways I can think of to get such a list are from unicore/Decomposition.pl or by scanning unicore/UnicodeData.txt. However: Re unicore/Decomposition.pl, the head

Re: Unicode characters

2009-05-25 Thread Andreas J. Koenig
> On Sun, 24 May 2009 10:09:25 +0200, Juerd Waalboer > said: > Although it's safe on output, it's better to get used to using > :encoding(utf8) instead of :utf8. Using :utf8 on input can cause > stability and security issues. That's new to me. Do you have a link that backs this up

Re: Unicode characters

2009-05-25 Thread Juerd Waalboer
Andreas J. Koenig skribis 2009-05-25 8:30 (+0200): > > On Sun, 24 May 2009 10:09:25 +0200, Juerd Waalboer > > said: > > Although it's safe on output, it's better to get used to using > > :encoding(utf8) instead of :utf8. Using :utf8 on input can cause > > stability and security iss

Re: Unicode characters

2009-05-24 Thread Juerd Waalboer
Andreas J. Koenig skribis 2009-05-24 6:44 (+0200): > binmode $_, ":utf8" for *STDOUT, *TEMP_OUT; Although it's safe on output, it's better to get used to using :encoding(utf8) instead of :utf8. Using :utf8 on input can cause stability and security issues. -- Met vriendelijke groet, Kind reg

Re: Unicode characters

2009-05-23 Thread Andreas J. Koenig
> On Fri, 22 May 2009 20:49:24 +0530, Saravanan Balaji > said: > Could you please help to know what i am missing or doing wrong. > I'll greatly appreciate the help. I think all you're missing is (1) that a script written in utf8 needs to declare that fact with a use utf8; and

Unicode characters

2009-05-22 Thread Saravanan Balaji
Hi Perl Gurus, I am using functions decode_entities() & decode_utf8() to decode the html codes and UTF (latin characters) respectively. (from module use Encode). The functions which i mentioned above works upto ASCII Decimals 255 and above that it works differently. This is the URL i referred to k

Re: List of unsupported unicode characters?

2007-01-10 Thread Oliver Block
Hello U+00A0 is not a UTF-8 character. The UTF-8 pendant for U+00A0 is C2 A0. What's interesting here is that A0 is part of the UTF-8 Sequence. So if that file is UTF-8, perl misses further bytes in the sequence. Otherwise it might not be UTF-8. Regards, Oliver Am Mittwoch, 10. Januar 2007 1

Re: List of unsupported unicode characters?

2007-01-10 Thread John Costello
On Wed, 10 Jan 2007, Paul Bijnens wrote: > On 2007-01-10 08:10, John Costello wrote: > > Is there a list of utf8 characters that perl cannot map, for example > > "\xA0"? This is with Perl 5.8.3. > > AFAIK there is no problem with "\xA0" if you mean the "\xA0" in > latin1 (iso8819-1) or similar e

Re: List of unsupported unicode characters?

2007-01-10 Thread Nicholas Clark
On Wed, Jan 10, 2007 at 12:02:32AM -0800, Darren Duncan wrote: > Now that the consortium has Unicode 5.0.0 out, I hope that Perl 5.8.9 > includes an understanding of it. Or if it doesn't, then Perl 5.10.0 > should at least, and I think already does in its 5.9.x dev branch. I think it likely th

Re: List of unsupported unicode characters?

2007-01-10 Thread Darren Duncan
At 11:10 PM -0800 1/9/07, John Costello wrote: Is there a list of utf8 characters that perl cannot map, for example "\xA0"? This is with Perl 5.8.3. Have a look at the perldelta files that come with Perl; they will tell you what particular version of the Unicode standard that it understands.

Re: List of unsupported unicode characters?

2007-01-10 Thread Paul Bijnens
On 2007-01-10 08:10, John Costello wrote: > Is there a list of utf8 characters that perl cannot map, for example > "\xA0"? This is with Perl 5.8.3. AFAIK there is no problem with "\xA0" if you mean the "\xA0" in latin1 (iso8819-1) or similar encodings. That is just the "no-break space". What

List of unsupported unicode characters?

2007-01-09 Thread John Costello
Is there a list of utf8 characters that perl cannot map, for example "\xA0"? This is with Perl 5.8.3.

Re: When regex "dot" doesn't work on unicode characters

2003-07-03 Thread Jarkko Hietaniemi
> s/(.)>/: $1 :>/; # HAS NO EFFECT (left side did not match?!) > s/(..)(A)/$1: $2 :/; # HAS NO EFFECT (left side did not match?!) > > Is this something that will be fixed in 5.8.1? Yes. It has been already been fixed. -- Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There

When regex "dot" doesn't work on unicode characters

2003-07-03 Thread David Graff
This may be related to an apparent 5.8.0 bug that was discussed back in February (the thread was "Odd regex behavior", started by Markus Kuhn), but I'm not sure... Consider a utf8 file containing just three characters and a line-feed, where the first character happens to be "wide" (two bytes in

Re: [Encode] How to support (Apple's) compound Unicode characters?

2002-04-01 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: >On Monday, April 1, 2002, at 07:33 , Nick Ing-Simmons wrote: >> Dan Kogai <[EMAIL PROTECTED]> writes: >>> I think I have found the reason why some of the encodings were >>> missing >>> from Tcl's *.enc, which later turned into *.ucm. >>> Apple makes use

Re: [Encode] How to support (Apple's) compound Unicode characters?

2002-04-01 Thread Dan Kogai
On Monday, April 1, 2002, at 07:33 , Nick Ing-Simmons wrote: > Dan Kogai <[EMAIL PROTECTED]> writes: >> I think I have found the reason why some of the encodings were >> missing >> from Tcl's *.enc, which later turned into *.ucm. >> Apple makes use of Unicode compound characters too extensive

Re: [Encode] How to support (Apple's) compound Unicode characters?

2002-04-01 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: > I think I have found the reason why some of the encodings were missing >from Tcl's *.enc, which later turned into *.ucm. > Apple makes use of Unicode compound characters too extensively, which >doesn't go well with .ucm, not to mention *.enc encengine c

[Encode] How to support (Apple's) compound Unicode characters?

2002-03-29 Thread Dan Kogai
On Saturday, March 30, 2002, at 03:24 , Dan Kogai wrote: > Okay. I've checked > > http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ > > One more time and it seems that other missing encodings are available > as well, such as korean. I'll look into that. I think I have found the reaso