from:"Karl Williamson"

Re: duplicate symbol cp1252_encoding in both Encode and Encode::Byte

2023-09-13 Thread Karl Williamson

On 8/2/23 21:42, Marc Lehmann wrote: Hi! Both Encode and Encode::Byte export a symbol called "cp1252_encoding", which can cause linker errors. It would be great if that could be changed by e.g. prepending some unique prefix to exported symbols (such as encode_ and encodebyte_ or somesuch), whic

Re: Encode UTF-8 optimizations

2016-09-25 Thread Karl Williamson

On 09/25/2016 04:06 AM, p...@cpan.org wrote: On Thursday 01 September 2016 09:30:08 p...@cpan.org wrote: On Wednesday 31 August 2016 21:27:37 Karl Williamson wrote: We may change Encode in blead too, since it already differs from cpan. I'll have to get Sawyer's opinion on that. Bu

Re: Encode UTF-8 optimizations

2016-08-31 Thread Karl Williamson

On 08/31/2016 03:43 PM, p...@cpan.org wrote: On Monday 29 August 2016 17:00:00 Karl Williamson wrote: If you'd be willing to test this out, especially the performance parts that would be great! [snip] There are 2 experimental performance commits. If you want to see if they actually im

Re: Encode UTF-8 optimizations

2016-08-29 Thread Karl Williamson

On 08/25/2016 01:48 AM, p...@cpan.org wrote: Anyway, if you need some help with Encode module or something different, let me know. As I want to have UTF-8 support in Encode correctly working... I now have a branch with my proposed changes at: http://perl5.git.perl.org/perl.git/shortlog/refs/hea

Re: Encode UTF-8 optimizations

2016-08-24 Thread Karl Williamson

On 08/22/2016 02:47 PM, p...@cpan.org wrote: snip I added some tests for overlong sequences. Only for ASCII platforms, tests for EBCDIC are missing (sorry, I do not have access to any EBCDIC platform for testing). It's fine to skip those tests on EBCDIC. > > Anyway, how it behave on EBCDI

Re: Encode UTF-8 optimizations

2016-08-22 Thread Karl Williamson

On 08/22/2016 03:19 PM, Karl Williamson wrote: On 08/22/2016 02:47 PM, p...@cpan.org wrote: > And I think you misunderstand when is_utf8_char_slow() is called. It is > called only when the next byte in the input indicates that the only > legal UTF-8 that might follow would be for a c

Re: Encode UTF-8 optimizations

2016-08-22 Thread Karl Williamson

On 08/22/2016 02:47 PM, p...@cpan.org wrote: > And I think you misunderstand when is_utf8_char_slow() is called. It is > called only when the next byte in the input indicates that the only > legal UTF-8 that might follow would be for a code point that is at least > U+20, almost twice as high

Re: Encode UTF-8 optimizations

2016-08-22 Thread Karl Williamson

On 08/22/2016 07:05 AM, p...@cpan.org wrote: On Sunday 21 August 2016 08:49:08 Karl Williamson wrote: On 08/21/2016 02:34 AM, p...@cpan.org wrote: On Sunday 21 August 2016 03:10:40 Karl Williamson wrote: Top posting. Attached is my alternative patch. It effectively uses a different

Re: Encode UTF-8 optimizations

2016-08-21 Thread Karl Williamson

On 08/21/2016 02:34 AM, p...@cpan.org wrote: On Sunday 21 August 2016 03:10:40 Karl Williamson wrote: Top posting. Attached is my alternative patch. It effectively uses a different algorithm to avoid decoding the input into code points, and to copy all spans of valid input at once, instead of

Re: Encode UTF-8 optimizations

2016-08-20 Thread Karl Williamson

On 08/20/2016 08:33 PM, Aristotle Pagaltzis wrote: * Karl Williamson [2016-08-21 03:12]: That should be done anyway to make sure we've got less buggy Unicode handling code available to older modules. I think you meant “available to older perls”? Yes, thanks

Re: Encode UTF-8 optimizations

2016-08-20 Thread Karl Williamson

missing or buggy in previous perls can and will be dealt with by the Devel::PPPort mechanism. On 08/19/2016 02:42 AM, p...@cpan.org wrote: On Thursday 18 August 2016 23:06:27 Karl Williamson wrote: On 08/12/2016 09:31 AM, p...@cpan.org wrote: On Thursday 11 August 2016 17:41:23 Karl Williamson wrote

Re: Encode UTF-8 optimizations

2016-08-18 Thread Karl Williamson

On 08/12/2016 09:31 AM, p...@cpan.org wrote: On Thursday 11 August 2016 17:41:23 Karl Williamson wrote: On 07/09/2016 05:12 PM, p...@cpan.org wrote: Hi! As we know utf8::encode() does not provide correct UTF-8 encoding and Encode::encode("UTF-8", ...) should be used instead. Also op

Re: Encode UTF-8 optimizations

2016-08-11 Thread Karl Williamson

On 07/09/2016 05:12 PM, p...@cpan.org wrote: Hi! As we know utf8::encode() does not provide correct UTF-8 encoding and Encode::encode("UTF-8", ...) should be used instead. Also opening file should be done by :encoding(UTF-8) layer instead :utf8. But UTF-8 strict implementation in Encode module i

Re: Comparing inputs with source strings

2016-05-11 Thread Karl Williamson

On 05/11/2016 02:04 AM, Daniel Dehennin wrote: Karl Williamson writes: On 05/09/2016 08:53 AM, Daniel Dehennin wrote: Hello, I tried to make my Perl5 code unicode compliant after reading a post on stackoverflow[1]. As suggested in the post: “always run incoming stuff through NFD and

Re: Comparing inputs with source strings

2016-05-10 Thread Karl Williamson

On 05/09/2016 08:53 AM, Daniel Dehennin wrote: Hello, I tried to make my Perl5 code unicode compliant after reading a post on stackoverflow[1]. As suggested in the post: “always run incoming stuff through NFD and outbound stuff from NFC.” I got a hard time finding why my Test::More was f

Re: UTF-8 encoding & decoding

2016-05-06 Thread Karl Williamson

On 05/05/2016 08:37 AM, Pali Rohár wrote: Hi! I though that I understand UTF-8 encoding/decoding done in perl until I looked into source code of Encode package... (exactly sub encode_utf8) Before... I only read description of Encode package (not source code): https://metacpan.org/pod/Encode#UTF

Re: Issue: Encode.so: undefined symbol: PL_utf8skip

2015-07-23 Thread Karl Williamson

On 07/23/2015 11:13 AM, Bright Dadson wrote: Hi Guys, I am trying to create a perl embed application which expose WWW::Mechanize into my Cython extension project. I compile and link my extension using: x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-

RFC: Reconciling Unicode Script and Script_Extensions Character Properties

2014-07-03 Thread Karl Williamson

The Unicode Consortium is seeking feedback on options for fixing anomalies involving a few characters with the Sc and Scx properties. For details and to comment, see http://www.unicode.org/review/pri277/

Re: Please help me with this perl question

2011-12-30 Thread Karl Williamson

On 12/29/2011 10:48 AM, FORREST COPLEY wrote: Is it possible to write a perl script to print a completely custom character on a console text terminal? Say a D rotated 90 degrees or something. or an A with the innards filled in. -- http://search.cpan.org/~bdfoy/Unicode-Tussle-1.03/lib/Unicode/

New API available to access Unicode DB, and RFC on changes to it.

2011-11-21 Thread Karl Williamson

Perl 5.15.5, now available, has additions to Unicode::UCD in it to allow unfettered programmatic access to the Unicode character data base. The API is quite similar to what was sent out for comment on this list several months ago; several changes were required as a result of lessons learned du

Re: RFC: API to access Unicode db files

2011-08-17 Thread Karl Williamson

Here's a new version of the API for comment, with the addition of 2 extra functions: prop_invlist() "prop_invlist" returns an inversion list (described below) that defines all the code points for the Unicode property given by the input parameter string: use Uni

RFC: API to access Unicode db files

2011-07-21 Thread Karl Williamson

Some applications are finding it necessary to read in the Unicode files that mktables generates. For example, grepping through CPAN indicates that Text::Unicode::Equivalents reads Decomposition.pl. This, and most of the other generated files are marked for internal use only, because we wish t

Re: Encode question

2011-07-07 Thread Karl Williamson

On 07/07/2011 01:17 AM, Dave Saunders wrote: Dear Encode Developers, I am migrating a perl application from Solaris 2.10 to Linux Fedora Core 14 (2.6.35.13-92.fc14.x86_64), which is running perl 5.12.3. The app uses SDBM and I'm encountering a problem which looks related to the Encode module (wh

Re: Need: list of Unicode characters that have canonical decompositions.

2011-07-06 Thread Karl Williamson

On 07/01/2011 11:49 AM, Karl Williamson wrote: On 07/01/2011 10:40 AM, BobH wrote: Karl Williamson wrote: I'm trying to think of a good name. Best so far is UCD::get_prop_invlist() Hm, "get" normally isn't needed. How about something simpler such as UCD::charlist()

Re: Need: list of Unicode characters that have canonical decompositions.

2011-07-01 Thread Karl Williamson

On 07/01/2011 10:40 AM, BobH wrote: Karl Williamson wrote: I'm trying to think of a good name. Best so far is UCD::get_prop_invlist() Hm, "get" normally isn't needed. How about something simpler such as UCD::charlist() Bob I think not having prop in the name is po

Re: Need: list of Unicode characters that have canonical decompositions.

2011-07-01 Thread Karl Williamson

On 06/29/2011 09:06 AM, BobH wrote: Karl Williamson wrote: If I did this, I would be tempted to have it return an inversion list, instead of an array of every code point that matches the property. ... My question to you is would that be acceptable to you, do you think? I hate to return an

Re: Need: list of Unicode characters that have canonical decompositions.

2011-06-28 Thread Karl Williamson

On 06/27/2011 08:04 PM, BobH wrote: Karl Williamson wrote: > I'm presuming you need this not for a one-time only thing, but to be > able to run this program over and over. Yes -- this is for a module that will be usable in a number of situations. See http://search.cpan.org/~bha

Re: Need: list of Unicode characters that have canonical decompositions.

2011-06-27 Thread Karl Williamson

On 06/27/2011 08:26 AM, BobH wrote: A project I'm working on needs to build a list of all Unicode characters that have canonical decompositions. The most efficient ways I can think of to get such a list are from unicore/Decomposition.pl or by scanning unicore/UnicodeData.txt. However: Re unicore

Re: Matching upper ASCII characters in RE patterns

2010-11-30 Thread karl williamson

quot;<:encoding(utf8)".) So, I'm confused as to whether this is 1 bug or more than 1, and how best to document it (or them). Could you advise me on this? On 30 Nov 2010, at 10:25, karl williamson wrote: Jonathan Pool wrote: Let's say the character NO-BREAK SPACE (U+00A0) appea

Re: Matching upper ASCII characters in RE patterns

2010-11-30 Thread karl williamson

karl williamson wrote: Jonathan Pool wrote: Let's say the character NO-BREAK SPACE (U+00A0) appears in a UTF8-encoded text file (so it appears there as C2A0), and I want to match strings that contain this character. I write a script (itself encoded with UTF8) in Perl 5.10.0 (on OS X 1

Re: Matching upper ASCII characters in RE patterns

2010-11-30 Thread karl williamson

Jonathan Pool wrote: Let's say the character NO-BREAK SPACE (U+00A0) appears in a UTF8-encoded text file (so it appears there as C2A0), and I want to match strings that contain this character. I write a script (itself encoded with UTF8) in Perl 5.10.0 (on OS X 10.6.5) with: use encoding 'utf

Re: Workaround to a unicode bug needed

2010-09-06 Thread karl williamson

Pierre Nugues wrote: Dear Michael, Pierre Nugues schrieb am 06.09.2010 um 11:09 (+0200): I wrote a simple tokenizer for texts containing Latin9 characters. You probably mean "non-ASCII characters". Latin9 alias ISO-8859-15 is the encoding. It's worth while making a distinction here. I meant

Re: Workaround to a unicode bug needed

2010-09-06 Thread karl williamson

You need to have a 'use utf8;' statement at the beginning of your program to tell Perl that it is encoded in utf8. I tested it with that, and it works. Pierre Nugues wrote: Dear All, I wrote a simple tokenizer for texts containing Latin9 characters. It does not behave as expected with the Sw

Re: Silence “Wide character” warning globally one time

2010-08-01 Thread karl williamson

Dan Muey wrote: Thank you Michael, I'll have a look at Juerd's page as it's the only thing I haven't yet :) On Jul 29, 2010, at 6:17 PM, Michael Ludwig wrote: Hi Dan, [Silence “Wide character” warning globally one time] Dan Muey schrieb am 29.07.2010 um 16:59 (-0500): I've a situation where

Re: duplicate symbol cp1252_encoding in both Encode and Encode::Byte

Re: Encode UTF-8 optimizations

Re: Encode UTF-8 optimizations

Re: Encode UTF-8 optimizations

Re: Encode UTF-8 optimizations

Re: Encode UTF-8 optimizations

Re: Encode UTF-8 optimizations

Re: Encode UTF-8 optimizations

Re: Encode UTF-8 optimizations

Re: Encode UTF-8 optimizations

Re: Encode UTF-8 optimizations

Re: Encode UTF-8 optimizations

Re: Encode UTF-8 optimizations

Re: Comparing inputs with source strings

Re: Comparing inputs with source strings

Re: UTF-8 encoding & decoding

Re: Issue: Encode.so: undefined symbol: PL_utf8skip

RFC: Reconciling Unicode Script and Script_Extensions Character Properties

Re: Please help me with this perl question

New API available to access Unicode DB, and RFC on changes to it.

Re: RFC: API to access Unicode db files

RFC: API to access Unicode db files

Re: Encode question

Re: Need: list of Unicode characters that have canonical decompositions.

Re: Need: list of Unicode characters that have canonical decompositions.

Re: Need: list of Unicode characters that have canonical decompositions.

Re: Need: list of Unicode characters that have canonical decompositions.

Re: Need: list of Unicode characters that have canonical decompositions.

Re: Matching upper ASCII characters in RE patterns

Re: Matching upper ASCII characters in RE patterns

Re: Matching upper ASCII characters in RE patterns

Re: Workaround to a unicode bug needed

Re: Workaround to a unicode bug needed

Re: Silence “Wide character” warning globally one time

34 matches

Site Navigation

Mail list logo

Footer information