On 07/01/2011 11:49 AM, Karl Williamson wrote:
On 07/01/2011 10:40 AM, BobH wrote:
Karl Williamson wrote:
I'm trying to think of a good name. Best so far is
UCD::get_prop_invlist()
Hm, "get" normally isn't needed.
How about something simpler such as UCD::charlist()
Bob
I think not ha
On 07/01/2011 10:40 AM, BobH wrote:
Karl Williamson wrote:
I'm trying to think of a good name. Best so far is
UCD::get_prop_invlist()
Hm, "get" normally isn't needed.
How about something simpler such as UCD::charlist()
Bob
I think not having prop in the name is potentially misleading,
Karl Williamson wrote:
I'm trying to think of a good name. Best so far is
UCD::get_prop_invlist()
Hm, "get" normally isn't needed.
How about something simpler such as UCD::charlist()
Bob
On 06/29/2011 09:06 AM, BobH wrote:
Karl Williamson wrote:
If I did this, I would be tempted to have it return an inversion
list, instead of an array of every code point that matches the
property. ...
My question to you is would that be acceptable to you, do you think?
I hate to return an enor
Karl Williamson wrote:
If I did this, I would be tempted to have it return an inversion
list, instead of an array of every code point that matches the
property. ...
My question to you is would that be acceptable to you, do you think?
I hate to return an enormous array by default when the appli
On 06/27/2011 08:04 PM, BobH wrote:
Karl Williamson wrote:
> I'm presuming you need this not for a one-time only thing, but to be
> able to run this program over and over.
Yes -- this is for a module that will be usable in a number of
situations. See
http://search.cpan.org/~bhallissy/Text-Uni
Karl Williamson wrote:
> I'm presuming you need this not for a one-time only thing, but to be
> able to run this program over and over.
Yes -- this is for a module that will be usable in a number of
situations. See
http://search.cpan.org/~bhallissy/Text-Unicode-Equivalents-0.05/.
The curren
On 06/27/2011 08:26 AM, BobH wrote:
A project I'm working on needs to build a list of all Unicode characters
that have canonical decompositions. The most efficient ways I can think
of to get such a list are from unicore/Decomposition.pl or by scanning
unicore/UnicodeData.txt. However
BobH wrote:
Re unicore/UnicodeData.txt, I've recently posted a version of my module
that uses unicore/UnicodeData.txt to CPAN, and from Perl 5.14 testers
I've received only failure notices which indicate that the file cannot
be found :-(
Just installed ActivePerl 5.14 and, indeed, this file n
A project I'm working on needs to build a list of all Unicode characters
that have canonical decompositions. The most efficient ways I can think
of to get such a list are from unicore/Decomposition.pl or by scanning
unicore/UnicodeData.txt. However:
Re unicore/Decomposition.pl, the head
> On Sun, 24 May 2009 10:09:25 +0200, Juerd Waalboer
> said:
> Although it's safe on output, it's better to get used to using
> :encoding(utf8) instead of :utf8. Using :utf8 on input can cause
> stability and security issues.
That's new to me. Do you have a link that backs this up
Andreas J. Koenig skribis 2009-05-25 8:30 (+0200):
> > On Sun, 24 May 2009 10:09:25 +0200, Juerd Waalboer
> > said:
> > Although it's safe on output, it's better to get used to using
> > :encoding(utf8) instead of :utf8. Using :utf8 on input can cause
> > stability and security iss
Andreas J. Koenig skribis 2009-05-24 6:44 (+0200):
> binmode $_, ":utf8" for *STDOUT, *TEMP_OUT;
Although it's safe on output, it's better to get used to using
:encoding(utf8) instead of :utf8. Using :utf8 on input can cause
stability and security issues.
--
Met vriendelijke groet, Kind reg
> On Fri, 22 May 2009 20:49:24 +0530, Saravanan Balaji
> said:
> Could you please help to know what i am missing or doing wrong.
> I'll greatly appreciate the help.
I think all you're missing is (1) that a script written in utf8 needs
to declare that fact with a
use utf8;
and
Hi Perl Gurus,
I am using functions decode_entities() & decode_utf8() to decode the html
codes and UTF (latin characters) respectively. (from module use Encode).
The functions which i mentioned above works upto ASCII Decimals 255 and
above that it works differently.
This is the URL i referred to k
Hello
U+00A0 is not a UTF-8 character. The UTF-8 pendant for U+00A0 is C2 A0.
What's interesting here is that A0 is part of the UTF-8 Sequence. So if that
file is UTF-8, perl misses further bytes in the sequence. Otherwise it might
not be UTF-8.
Regards,
Oliver
Am Mittwoch, 10. Januar 2007 1
On Wed, 10 Jan 2007, Paul Bijnens wrote:
> On 2007-01-10 08:10, John Costello wrote:
> > Is there a list of utf8 characters that perl cannot map, for example
> > "\xA0"? This is with Perl 5.8.3.
>
> AFAIK there is no problem with "\xA0" if you mean the "\xA0" in
> latin1 (iso8819-1) or similar e
On Wed, Jan 10, 2007 at 12:02:32AM -0800, Darren Duncan wrote:
> Now that the consortium has Unicode 5.0.0 out, I hope that Perl 5.8.9
> includes an understanding of it. Or if it doesn't, then Perl 5.10.0
> should at least, and I think already does in its 5.9.x dev branch.
I think it likely th
At 11:10 PM -0800 1/9/07, John Costello wrote:
Is there a list of utf8 characters that perl cannot map, for example
"\xA0"? This is with Perl 5.8.3.
Have a look at the perldelta files that come with Perl; they will
tell you what particular version of the Unicode standard that it
understands.
On 2007-01-10 08:10, John Costello wrote:
> Is there a list of utf8 characters that perl cannot map, for example
> "\xA0"? This is with Perl 5.8.3.
AFAIK there is no problem with "\xA0" if you mean the "\xA0" in
latin1 (iso8819-1) or similar encodings. That is just the "no-break
space".
What
Is there a list of utf8 characters that perl cannot map, for example
"\xA0"? This is with Perl 5.8.3.
> s/(.)>/: $1 :>/; # HAS NO EFFECT (left side did not match?!)
> s/(..)(A)/$1: $2 :/; # HAS NO EFFECT (left side did not match?!)
>
> Is this something that will be fixed in 5.8.1?
Yes. It has been already been fixed.
--
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There
This may be related to an apparent 5.8.0 bug that was discussed back in
February (the thread was "Odd regex behavior", started by Markus Kuhn),
but I'm not sure...
Consider a utf8 file containing just three characters and a line-feed,
where the first character happens to be "wide" (two bytes in
Dan Kogai <[EMAIL PROTECTED]> writes:
>On Monday, April 1, 2002, at 07:33 , Nick Ing-Simmons wrote:
>> Dan Kogai <[EMAIL PROTECTED]> writes:
>>> I think I have found the reason why some of the encodings were
>>> missing
>>> from Tcl's *.enc, which later turned into *.ucm.
>>> Apple makes use
On Monday, April 1, 2002, at 07:33 , Nick Ing-Simmons wrote:
> Dan Kogai <[EMAIL PROTECTED]> writes:
>> I think I have found the reason why some of the encodings were
>> missing
>> from Tcl's *.enc, which later turned into *.ucm.
>> Apple makes use of Unicode compound characters too extensive
Dan Kogai <[EMAIL PROTECTED]> writes:
> I think I have found the reason why some of the encodings were missing
>from Tcl's *.enc, which later turned into *.ucm.
> Apple makes use of Unicode compound characters too extensively, which
>doesn't go well with .ucm, not to mention *.enc
encengine c
On Saturday, March 30, 2002, at 03:24 , Dan Kogai wrote:
> Okay. I've checked
>
> http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/
>
> One more time and it seems that other missing encodings are available
> as well, such as korean. I'll look into that.
I think I have found the reaso
27 matches
Mail list logo