Re: Encode UTF-8 optimizations

pali Thu, 25 Aug 2016 00:49:52 -0700

On Wednesday 24 August 2016 22:49:21 Karl Williamson wrote:
> On 08/22/2016 02:47 PM, p...@cpan.org wrote:
> 
> snip
> 
> >I added some tests for overlong sequences. Only for ASCII platforms, tests 
> >for EBCDIC
> >are missing (sorry, I do not have access to any EBCDIC platform for testing).
> 
> It's fine to skip those tests on EBCDIC.


Ok.

> >>>> > Anyway, how it behave on EBCDIC platforms? And maybe another question
> >>>> > what should  Encode::encode('UTF-8', $str) do on EBCDIC? Encode $str to
> >>>> > UTF-8 or to UTF-EBCDIC?
> >>>
> >>> It works fine on EBCDIC platforms.  There are other bugs in Encode on
> >>> EBCDIC that I plan on investigating as time permits.  Doing this has
> >>> fixed some of these for free.  The uvuni() functions should in almost
> >>> all instances be uvchr(), and my patch does that.
> >Now I'm thinking if FBCHAR_UTF8 define is working also on EBCDIC... I think 
> >that it
> >should be different for UTF-EBCDIC.
> 
> I'll fix that
> >
> >>> On EBCDIC platforms, UTF-8 is defined to be UTF-EBCDIC (or vice versa if
> >>> you prefer), so $str will effectively be in the version of UTF-EBCDIC
> >>> valid for the platform it is running on (there are differences depending
> >>> on the platform's underlying code page).
> >So it means that on EBCDIC platforms you cannot process file which is 
> >encoded in UTF-8?
> >As Encode::decode("UTF-8", $str) expect $str to be in UTF-EBCDIC and not in 
> >UTF-8 (as I
> >understood).
> >
> Yes.  The two worlds do not meet.  If you are on an EBCDIC platform, the
> native encoding is UTF-EBCDIC tailored to the code page the platform runs
> on.
> 
> In searching, I did not find anything that converts between the two, so I
> wrote a Perl script to do so.  Our OS/390 man, Yaroslav, wrote one in C.

Thank you for information! I though that "UTF-8" encoding (with hyphen)
is that strict and correct UTF-8 version on both ASCII & EBCDIC
platforms as in Encode documentation is nothing written that on EBCDIC
is is different...

Anyway, if you need some help with Encode module or something different,
let me know. As I want to have UTF-8 support in Encode correctly
working...

Re: Encode UTF-8 optimizations

Reply via email to