On Thu, Jan 17, 2013 at 10:03 PM, Mukenx <muk...@gmail.com> wrote:
> I have an oorexx script that receives text strings (json strings) encoded
> in utf8 (peppered with german diacritics) and would like to convert the
> strings into ansi format.
>
> I discovered the sysFromUnicode and sysToUnicode functions in the oorexx
> 4.1.1 manual but could to get any meaningful results.
>
> here is what I tried:
> 1) store the text "Tür" in a utf8.txt file in utf8 format
> 2) and read it back in with rexx in a variable str.
> fs = .Stream~new('utf8.txt')
> str = fs~linein
> fs~close
> say str
> say 'rc = 'sysFromUnicode(str, , , , 'outStem.')
> loop ix over outStem.
> say 'outstem.'ix' = <'outstem.ix'>'
> end
>
> outputs:
> Tür
> rc = 0
> outstem.!TEXT = <??>
> outstem.!USEDDEFAULTCHAR = <1>
>
> Can someone help me out here?
>
Madou, I can't help much here because I'm not real knowledgeable in this
area. But, I have a few comments.
The interpreter is ANSI based, so you need the input to SysFromUnicode to
be a series of bytes where the bytes are in UTF8 format. I would start
off by not using linein(), but charin() where you give the complete file
size as an argument and read in the complete file at one time. However,
I'm not positive that will work because there may be come code page
translation done.
Second, you need to specify the codepage argument as UFT8, somewhere. I
looked at the code for SysFromUnicode and SysToUnicode, and the
documentation for the Windows API it uses. I think that the Windows API
converts to and from UTF16 *only*. You specify the codepage to use in the
translation.
To convert UTF8 to ANSI, it looks to me like you would have to first
convert UTF8 to UTF16 using SysToUnicode() and then take the output of that
conversion and use SysFromUnicode to convert the UTF16 string to the ANSI
codepage your are running in on your computer.
The following simple example works for me:
/* Simple UTF8 to ANSI test */
-- Cent Pound Currency signs
inString = 'c2a2c2a3c2a4'x
say 'Using string:' inString
say
ret = SysToUnicode(inString, 'UTF8', , out.)
if ret == 0 then say 'Convert UTF8 to UTF16 succeeded'
else say 'Convert UTF8 to UTF16 failed. rc:' ret
ret = SysFromUnicode(out.!TEXT, '437', , , ansi.)
if ret == 0 then say 'Convert UTF16 to ANSI succeeded'
else say 'Convert UTF16 to ANSI failed. rc:' ret
say 'ANSI text:' ansi.!TEXT
say "Used conversion character:" boolean2str(ansi.!USEDDEFAULTCHAR)
say
say 'Code page in console:'
'chcp'
::routine boolean2str
use strict arg val
if val then return 'true'
else return 'false'
Note in the above, for the SysFromUnicode() call, I used the active code
page number in the console I am working in. Here is the display I get in
my console, how this will look in this e-mail on your system, I have no
idea:
Using string: ¢£¤
Convert UTF8 to UTF16 succeeded
Convert UTF16 to ANSI succeeded
ANSI text: ¢£☼
Used conversion character: false
Code page in console:
Active code page: 437
But, in my console I see the 'Using string' as gibberish and the cent and
pound sign correctly. So, it looks to me like this works fine. The
currency sign does not make sense to me, but I don't know what it should
be.
--
Mark Miesfeld
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122912
_______________________________________________
Oorexx-users mailing list
Oorexx-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-users