Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-10 Thread Larry Wall
Also consider this recent addition to S02: Author: larry Date: Thu Jan 10 13:05:42 2008 New Revision: 14486 Modified: doc/trunk/design/syn/S02.pod Log: Added some random thoughts about performance implications of grapheme view Modified: doc/trunk/design/syn/S02.pod

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-06 Thread ajr
Jarkko's view was that if he were doing Perl 5 Unicode again he would opt for fixed width 32 bit rather than UTF-8, It seems to be a general principle of system design that the best way to process irregular and unpredictable things, is to grab them as close to the outside of the system as

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Patrick R. Michaud
On Sat, Jan 05, 2008 at 01:09:01AM -0600, Patrick R. Michaud wrote: On Sat, Jan 05, 2008 at 12:29:40AM -0600, Patrick R. Michaud wrote: On Fri, Jan 04, 2008 at 07:43:18PM -0800, chromatic wrote: (Callgrind suggests that about 45% of the running time of the NQP part of the build comes

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread chromatic
On Friday 04 January 2008 22:29:40 Patrick R. Michaud wrote: Actually, the perl6 compiler and PCT are really agnostic about utf8 -- they rely on Parrot to handle any transcoding issues. They try to keep strings as ASCII whenever possible, and only use unicode:... when there's a character

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread chromatic
On Saturday 05 January 2008 01:26:48 Patrick R. Michaud wrote: As of r24557 I've rewritten find_cclass and find_not_cclass so that they use a string iterator instead of repeated calls to ENCODING_GET_CODEPOINT. I also improved utf8_set_position a bit so that it doesn't always have to restart

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Nicholas Clark
On Sat, Jan 05, 2008 at 02:11:35AM -0800, chromatic wrote: On Saturday 05 January 2008 01:26:48 Patrick R. Michaud wrote: I think it will still be worthwhile to investigate converting strings into a fixed-width encoding of some sort instead of performing scans on variable-width encodings.

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Cosimo Streppone
Patrick wrote: [...] I also improved utf8_set_position a bit so that it doesn't always have to restart position counting from the beginning of the string. As a result, compiling the actions.pl script on my machine goes from 39s to a little over 28s -- about a 25% speed increase. I have a

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Cosimo Streppone
Cosimo wrote: Patrick wrote: [...] I also improved utf8_set_position What happens if string already has `i-charpos pos' ? [... /me reads again the diff ...] I realized while writing this that if `i-charpos pos'. you simply end up re-scanning the string from the start. Is that correct?

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Patrick R. Michaud
On Sat, Jan 05, 2008 at 12:17:00PM +0100, Cosimo Streppone wrote: Patrick wrote: [...] I also improved utf8_set_position a bit so that it doesn't always have to restart position counting from the beginning of the string. As a result, compiling the actions.pl script on my machine goes from

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Patrick R. Michaud
On Sat, Jan 05, 2008 at 02:11:35AM -0800, chromatic wrote: On Saturday 05 January 2008 01:26:48 Patrick R. Michaud wrote: As of r24557 I've rewritten find_cclass and find_not_cclass so that they use a string iterator instead of repeated calls to ENCODING_GET_CODEPOINT. I also improved

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Patrick R. Michaud
On Sat, Jan 05, 2008 at 11:09:57AM +, Nicholas Clark wrote: On Sat, Jan 05, 2008 at 02:11:35AM -0800, chromatic wrote: On Saturday 05 January 2008 01:26:48 Patrick R. Michaud wrote: I think it will still be worthwhile to investigate converting strings into a fixed-width encoding of

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Nicholas Clark
On Sat, Jan 05, 2008 at 12:19:14PM -0600, Patrick R. Michaud wrote: On Sat, Jan 05, 2008 at 11:09:57AM +, Nicholas Clark wrote: On Sat, Jan 05, 2008 at 02:11:35AM -0800, chromatic wrote: Jarkko's view was that if he were doing Perl 5 Unicode again he would opt for fixed width 32 bit

Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-04 Thread chromatic
I just ran a little experiment. I patched Parrot::HLLCompiler to transcode the source code it reads to UCS-2 before parsing and compiling it, then I profiled building perl6.pbc. Without this hack, the build takes around 20 seconds, mostly running NQP over

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-04 Thread Patrick R. Michaud
On Fri, Jan 04, 2008 at 07:43:18PM -0800, chromatic wrote: I just ran a little experiment. I patched Parrot::HLLCompiler to transcode the source code it reads to UCS-2 before parsing and compiling it, then I profiled building perl6.pbc. Without this hack, the build takes around 20

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-04 Thread Patrick R. Michaud
On Sat, Jan 05, 2008 at 12:29:40AM -0600, Patrick R. Michaud wrote: On Fri, Jan 04, 2008 at 07:43:18PM -0800, chromatic wrote: (Callgrind suggests that about 45% of the running time of the NQP part of the build comes from utf8_set_position and utf8_skip_forward.) Even better might be