Encode UTF-8 optimizations

2016-07-09 Thread pali
Hi! As we know utf8::encode() does not provide correct UTF-8 encoding and Encode::encode("UTF-8", ...) should be used instead. Also opening file should be done by :encoding(UTF-8) layer instead :utf8. But UTF-8 strict implementation in Encode module is horrible slow when comparing to utf8::encode(

Re: Encode UTF-8 optimizations

2016-08-12 Thread pali
On Thursday 11 August 2016 17:41:23 Karl Williamson wrote: > On 07/09/2016 05:12 PM, p...@cpan.org wrote: > >Hi! As we know utf8::encode() does not provide correct UTF-8 encoding > >and Encode::encode("UTF-8", ...) should be used instead. Also opening > >file should be done by :encoding(UTF-8) laye

Encode utf8 warnings

2016-08-13 Thread pali
Hello, I see that there is one big mess in utf8 warnings for Encode. First, warnings should be enabled by warnings pragma. For utf8 there are: utf8, non_unicode, nonchar, surrogate. Second, warnings for Encode can be enabled by check flag Encode::FB_WARN or Encode::WARN_ON_ERR. Third, warnings f

Re: Encode UTF-8 optimizations

2016-08-19 Thread pali
On Thursday 18 August 2016 23:06:27 Karl Williamson wrote: > On 08/12/2016 09:31 AM, p...@cpan.org wrote: > >On Thursday 11 August 2016 17:41:23 Karl Williamson wrote: > >>On 07/09/2016 05:12 PM, p...@cpan.org wrote: > >>>Hi! As we know utf8::encode() does not provide correct UTF-8 encoding > >>>an

Re: Encode UTF-8 optimizations

2016-08-21 Thread pali
On Sunday 21 August 2016 03:10:40 Karl Williamson wrote: > Top posting. > > Attached is my alternative patch. It effectively uses a different > algorithm to avoid decoding the input into code points, and to copy > all spans of valid input at once, instead of character at a time. > > And it uses

Re: Encode UTF-8 optimizations

2016-08-22 Thread pali
On Sunday 21 August 2016 08:49:08 Karl Williamson wrote: > On 08/21/2016 02:34 AM, p...@cpan.org wrote: > >On Sunday 21 August 2016 03:10:40 Karl Williamson wrote: > >>Top posting. > >> > >>Attached is my alternative patch. It effectively uses a different > >>algorithm to avoid decoding the input

Re: Encode utf8 warnings

2016-08-22 Thread pali
On Saturday 13 August 2016 19:41:46 p...@cpan.org wrote: > Hello, I see that there is one big mess in utf8 warnings for Encode. Per request this discussion was moved to perl5-port...@perl.org ML: http://www.nntp.perl.org/group/perl.perl5.porters/2016/08/msg239061.html

Re: Encode UTF-8 optimizations

2016-08-22 Thread pali
On Monday 22 August 2016 21:43:59 Karl Williamson wrote: > On 08/22/2016 07:05 AM, p...@cpan.org wrote: > > On Sunday 21 August 2016 08:49:08 Karl Williamson wrote: > >> On 08/21/2016 02:34 AM, p...@cpan.org wrote: > >>> On Sunday 21 August 2016 03:10:40 Karl Williamson wrote: > Top posting. >

Re: Encode UTF-8 optimizations

2016-08-22 Thread pali
(this only applies for strict UTF-8) On Monday 22 August 2016 23:19:51 Karl Williamson wrote: > The code could be tweaked to call UTF8_IS_SUPER first, but I'm > asserting that an optimizing compiler will see that any call to > is_utf8_char_slow() is pointless, and will optimize it out. Such optim

Re: Encode UTF-8 optimizations

2016-08-22 Thread pali
On Monday 22 August 2016 23:38:05 Karl Williamson wrote: > And, I'd rather not tweak it to call UTF8_IS_SUPER first, > because that relies on knowing what the current internal > implementation is. Then maybe add new macro isUTF8_CHAR_STRICT which only check if character is strictly valid UTF-8? I

Re: Encode UTF-8 optimizations

2016-08-25 Thread pali
On Wednesday 24 August 2016 22:49:21 Karl Williamson wrote: > On 08/22/2016 02:47 PM, p...@cpan.org wrote: > > snip > > >I added some tests for overlong sequences. Only for ASCII platforms, tests > >for EBCDIC > >are missing (sorry, I do not have access to any EBCDIC platform for testing). > >

Re: Encode UTF-8 optimizations

2016-08-31 Thread pali
On Monday 29 August 2016 17:00:00 Karl Williamson wrote: > If you'd be willing to test this out, especially the performance > parts that would be great! [snip] > There are 2 experimental performance commits. If you want to see if > they actually improve performance by doing a before/after compare

Re: Encode UTF-8 optimizations

2016-09-01 Thread pali
On Wednesday 31 August 2016 21:27:37 Karl Williamson wrote: > On 08/31/2016 03:43 PM, p...@cpan.org wrote: > >On Monday 29 August 2016 17:00:00 Karl Williamson wrote: > >>If you'd be willing to test this out, especially the performance > >>parts that would be great! > >[snip] > >>There are 2 experi

Re: Encode UTF-8 optimizations

2016-09-25 Thread pali
On Thursday 01 September 2016 09:30:08 p...@cpan.org wrote: > On Wednesday 31 August 2016 21:27:37 Karl Williamson wrote: > > We may change Encode in blead too, since it already differs from > > cpan. I'll have to get Sawyer's opinion on that. But the next > > step is for me to fix Devel::PPPort t

Re: Encode UTF-8 optimizations

2016-10-27 Thread pali
On Sunday 25 September 2016 10:49:41 Karl Williamson wrote: > On 09/25/2016 04:06 AM, p...@cpan.org wrote: > >On Thursday 01 September 2016 09:30:08 p...@cpan.org wrote: > >>On Wednesday 31 August 2016 21:27:37 Karl Williamson wrote: > >>>We may change Encode in blead too, since it already differs

Re: Encode UTF-8 optimizations

2016-11-01 Thread pali
Hi! New Encode 2.87 with lots of fixes for Encode.xs and Encode::MIME::Header was released. Can you sync/import it into blead?

Re: select a variable as stdout and utf8 flag behaviour

2016-11-09 Thread pali
On Wednesday 09 November 2016 15:55:47 Gert Brinkmann wrote: > Hello, > ... > > This prints out the utf8 characters corrupted. You have to flag the > Variable after writing into it with Encode::_utf8_on() as utf8 to make > it work correctly. (So activate the commented line.) > > Using this _utf8

Re: select a variable as stdout and utf8 flag behaviour

2016-11-09 Thread pali
On Wednesday 09 November 2016 19:46:46 Gert Brinkmann wrote: > Pali, thank you very much for your answer. I am using the > Encode::decode('UTF-8', ...) function now instead of touching the > flag. Though I am not sure if a routine becomes better (more robust) > if it ac

Re: perldoc Encode

2017-06-01 Thread pali
On Tuesday 30 May 2017 22:37:07 supp...@agrotekhnik.ru wrote: > HelloI can not compile and > install2.89Debian 8make[1]: вход в каталог > «/tmp/Encode-2.89/Byte»"/usr/bin/perl" -MExtUtils::Command::MM -e > 'cp_nonempty' -- Byte.bs blib/arch/auto/Encode/Byte/Byte.bs 644cc > -c -I./Encode  -I../E

Re: Fwd: Encode 3.0

2019-03-12 Thread pali
Hello, in future please write in English. That problem should be already fixed by pull request: https://github.com/dankogai/p5-encode/pull/138 On Tuesday 19 February 2019 20:08:00 dagmatritsa via perl-unicode wrote: > > > > Пересылаемое сообщение > От кого: dagmatritsa > Ком

UTF-8 encoding & decoding

2016-05-06 Thread Pali Rohár
tr); $str = Encode::decode_utf8($str); 3. Where is implementation of utf8::encode/decode functions? It is not in utf8.pm, nor in utf8_heavy.pl and also not in unicore/Heavy.pl. And what those functions doing? -- Pali Rohár pali.ro...@gmail.com

Re: UTF-8 encoding & decoding

2016-05-12 Thread Pali Rohár
On Friday 06 May 2016 09:24:01 Karl Williamson wrote: > On 05/05/2016 08:37 AM, Pali Rohár wrote: > >Hi! > > > >I though that I understand UTF-8 encoding/decoding done in perl until I > >looked into source code of Encode package... (exactly sub encode_utf8) > > >