Re: Silence “Wide character” warning globally one time

2010-08-01 Thread karl williamson
Dan Muey wrote: Thank you Michael, I'll have a look at Juerd's page as it's the only thing I haven't yet :) On Jul 29, 2010, at 6:17 PM, Michael Ludwig wrote: Hi Dan, [Silence “Wide character” warning globally one time] Dan Muey schrieb am 29.07.2010 um 16:59 (-0500): I've a situation

Re: Workaround to a unicode bug needed

2010-09-06 Thread karl williamson
You need to have a 'use utf8;' statement at the beginning of your program to tell Perl that it is encoded in utf8. I tested it with that, and it works. Pierre Nugues wrote: Dear All, I wrote a simple tokenizer for texts containing Latin9 characters. It does not behave as expected with the

Re: Workaround to a unicode bug needed

2010-09-06 Thread karl williamson
Pierre Nugues wrote: Dear Michael, Pierre Nugues schrieb am 06.09.2010 um 11:09 (+0200): I wrote a simple tokenizer for texts containing Latin9 characters. You probably mean non-ASCII characters. Latin9 alias ISO-8859-15 is the encoding. It's worth while making a distinction here. I meant

Re: Matching upper ASCII characters in RE patterns

2010-11-30 Thread karl williamson
Jonathan Pool wrote: Let's say the character NO-BREAK SPACE (U+00A0) appears in a UTF8-encoded text file (so it appears there as C2A0), and I want to match strings that contain this character. I write a script (itself encoded with UTF8) in Perl 5.10.0 (on OS X 10.6.5) with: use encoding

Re: Matching upper ASCII characters in RE patterns

2010-11-30 Thread karl williamson
karl williamson wrote: Jonathan Pool wrote: Let's say the character NO-BREAK SPACE (U+00A0) appears in a UTF8-encoded text file (so it appears there as C2A0), and I want to match strings that contain this character. I write a script (itself encoded with UTF8) in Perl 5.10.0 (on OS X 10.6.5

Re: Matching upper ASCII characters in RE patterns

2010-11-30 Thread karl williamson
best to document it (or them). Could you advise me on this? On 30 Nov 2010, at 10:25, karl williamson wrote: Jonathan Pool wrote: Let's say the character NO-BREAK SPACE (U+00A0) appears in a UTF8-encoded text file (so it appears there as C2A0), and I want to match strings that contain

Re: Need: list of Unicode characters that have canonical decompositions.

2011-06-27 Thread Karl Williamson
On 06/27/2011 08:26 AM, BobH wrote: A project I'm working on needs to build a list of all Unicode characters that have canonical decompositions. The most efficient ways I can think of to get such a list are from unicore/Decomposition.pl or by scanning unicore/UnicodeData.txt. However: Re

Re: Need: list of Unicode characters that have canonical decompositions.

2011-07-01 Thread Karl Williamson
On 06/29/2011 09:06 AM, BobH wrote: Karl Williamson wrote: If I did this, I would be tempted to have it return an inversion list, instead of an array of every code point that matches the property. ... My question to you is would that be acceptable to you, do you think? I hate to return

Re: Need: list of Unicode characters that have canonical decompositions.

2011-07-01 Thread Karl Williamson
On 07/01/2011 10:40 AM, BobH wrote: Karl Williamson wrote: I'm trying to think of a good name. Best so far is UCD::get_prop_invlist() Hm, get normally isn't needed. How about something simpler such as UCD::charlist() Bob I think not having prop in the name is potentially misleading

Re: Need: list of Unicode characters that have canonical decompositions.

2011-07-06 Thread Karl Williamson
On 07/01/2011 11:49 AM, Karl Williamson wrote: On 07/01/2011 10:40 AM, BobH wrote: Karl Williamson wrote: I'm trying to think of a good name. Best so far is UCD::get_prop_invlist() Hm, get normally isn't needed. How about something simpler such as UCD::charlist() Bob I think

Re: Encode question

2011-07-07 Thread Karl Williamson
On 07/07/2011 01:17 AM, Dave Saunders wrote: Dear Encode Developers, I am migrating a perl application from Solaris 2.10 to Linux Fedora Core 14 (2.6.35.13-92.fc14.x86_64), which is running perl 5.12.3. The app uses SDBM and I'm encountering a problem which looks related to the Encode module

RFC: API to access Unicode db files

2011-07-21 Thread Karl Williamson
Some applications are finding it necessary to read in the Unicode files that mktables generates. For example, grepping through CPAN indicates that Text::Unicode::Equivalents reads Decomposition.pl. This, and most of the other generated files are marked for internal use only, because we wish

Re: RFC: API to access Unicode db files

2011-08-17 Thread Karl Williamson
Here's a new version of the API for comment, with the addition of 2 extra functions: prop_invlist() prop_invlist returns an inversion list (described below) that defines all the code points for the Unicode property given by the input parameter string: use

New API available to access Unicode DB, and RFC on changes to it.

2011-11-21 Thread Karl Williamson
Perl 5.15.5, now available, has additions to Unicode::UCD in it to allow unfettered programmatic access to the Unicode character data base. The API is quite similar to what was sent out for comment on this list several months ago; several changes were required as a result of lessons learned

Re: Please help me with this perl question

2011-12-30 Thread Karl Williamson
On 12/29/2011 10:48 AM, FORREST COPLEY wrote: Is it possible to write a perl script to print a completely custom character on a console text terminal? Say a D rotated 90 degrees or something. or an A with the innards filled in. --

RFC: Reconciling Unicode Script and Script_Extensions Character Properties

2014-07-03 Thread Karl Williamson
The Unicode Consortium is seeking feedback on options for fixing anomalies involving a few characters with the Sc and Scx properties. For details and to comment, see http://www.unicode.org/review/pri277/

Re: Issue: Encode.so: undefined symbol: PL_utf8skip

2015-07-23 Thread Karl Williamson
On 07/23/2015 11:13 AM, Bright Dadson wrote: Hi Guys, I am trying to create a perl embed application which expose WWW::Mechanize into my Cython extension project. I compile and link my extension using: x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g

Re: Comparing inputs with source strings

2016-05-10 Thread Karl Williamson
On 05/09/2016 08:53 AM, Daniel Dehennin wrote: Hello, I tried to make my Perl5 code unicode compliant after reading a post on stackoverflow[1]. As suggested in the post: “always run incoming stuff through NFD and outbound stuff from NFC.” I got a hard time finding why my Test::More was

Re: Comparing inputs with source strings

2016-05-11 Thread Karl Williamson
On 05/11/2016 02:04 AM, Daniel Dehennin wrote: Karl Williamson <pub...@khwilliamson.com> writes: On 05/09/2016 08:53 AM, Daniel Dehennin wrote: Hello, I tried to make my Perl5 code unicode compliant after reading a post on stackoverflow[1]. As suggested in the post: “alwa

Re: UTF-8 encoding & decoding

2016-05-06 Thread Karl Williamson
On 05/05/2016 08:37 AM, Pali Rohár wrote: Hi! I though that I understand UTF-8 encoding/decoding done in perl until I looked into source code of Encode package... (exactly sub encode_utf8) Before... I only read description of Encode package (not source code):

Re: Encode UTF-8 optimizations

2016-08-11 Thread Karl Williamson
On 07/09/2016 05:12 PM, p...@cpan.org wrote: Hi! As we know utf8::encode() does not provide correct UTF-8 encoding and Encode::encode("UTF-8", ...) should be used instead. Also opening file should be done by :encoding(UTF-8) layer instead :utf8. But UTF-8 strict implementation in Encode module

Re: Encode UTF-8 optimizations

2016-08-18 Thread Karl Williamson
On 08/12/2016 09:31 AM, p...@cpan.org wrote: On Thursday 11 August 2016 17:41:23 Karl Williamson wrote: On 07/09/2016 05:12 PM, p...@cpan.org wrote: Hi! As we know utf8::encode() does not provide correct UTF-8 encoding and Encode::encode("UTF-8", ...) should be used instead. Also op

Re: Encode UTF-8 optimizations

2016-08-20 Thread Karl Williamson
On 08/20/2016 08:33 PM, Aristotle Pagaltzis wrote: * Karl Williamson <pub...@khwilliamson.com> [2016-08-21 03:12]: That should be done anyway to make sure we've got less buggy Unicode handling code available to older modules. I think you meant “available to older perls”? Yes, thanks

Re: Encode UTF-8 optimizations

2016-08-20 Thread Karl Williamson
of these that are missing or buggy in previous perls can and will be dealt with by the Devel::PPPort mechanism. On 08/19/2016 02:42 AM, p...@cpan.org wrote: On Thursday 18 August 2016 23:06:27 Karl Williamson wrote: On 08/12/2016 09:31 AM, p...@cpan.org wrote: On Thursday 11 August 2016 17:41:23 Karl Williamson wrote

Re: Encode UTF-8 optimizations

2016-08-22 Thread Karl Williamson
On 08/22/2016 07:05 AM, p...@cpan.org wrote: On Sunday 21 August 2016 08:49:08 Karl Williamson wrote: On 08/21/2016 02:34 AM, p...@cpan.org wrote: On Sunday 21 August 2016 03:10:40 Karl Williamson wrote: Top posting. Attached is my alternative patch. It effectively uses a different

Re: Encode UTF-8 optimizations

2016-08-22 Thread Karl Williamson
On 08/22/2016 02:47 PM, p...@cpan.org wrote: > And I think you misunderstand when is_utf8_char_slow() is called. It is > called only when the next byte in the input indicates that the only > legal UTF-8 that might follow would be for a code point that is at least > U+20, almost twice as

Re: Encode UTF-8 optimizations

2016-08-22 Thread Karl Williamson
On 08/22/2016 03:19 PM, Karl Williamson wrote: On 08/22/2016 02:47 PM, p...@cpan.org wrote: > And I think you misunderstand when is_utf8_char_slow() is called. It is > called only when the next byte in the input indicates that the only > legal UTF-8 that might follow would be for a c

Re: duplicate symbol cp1252_encoding in both Encode and Encode::Byte

2023-09-13 Thread Karl Williamson
On 8/2/23 21:42, Marc Lehmann wrote: Hi! Both Encode and Encode::Byte export a symbol called "cp1252_encoding", which can cause linker errors. It would be great if that could be changed by e.g. prepending some unique prefix to exported symbols (such as encode_ and encodebyte_ or somesuch),