[PATCH] IBM z/OS + EBCDIC support
is contiguous * Relocated the comment about escape portability to sh.h * Control-key mapping escape table implementation, including the bit of Perl I used to generate the actual mapping code. The control code for '?' is handled as a special case (but could be incorporated into the Perl if desired) * Note the commented-out ebcdic_isctrl() function. This may or may not be preferable to the EBCDIC ISCTRL() macro currently in sh.h. Be aware that this function will return true for a lot fewer inputs than the macro +++ var.c * Check for upper/lowercase 'X' without resorting to ASCII trickery * Use the ORD() macro so that these subtractions don't inadvertently become additions I will be happy to provide further testing and answer any questions as needed. --Daniel P.S.: Please Cc: me in any replies, as I am not subscribed to this list. -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman. Index: Build.sh === RCS file: /cvs/src/bin/mksh/Build.sh,v retrieving revision 1.674 diff -u -r1.674 Build.sh --- Build.sh 19 Apr 2015 18:50:59 - 1.674 +++ Build.sh 24 Apr 2015 02:12:22 - @@ -419,7 +419,11 @@ na=0 fi hf=$1; shift - hv=`echo $hf | tr -d '\012\015' | tr -c $alll$allu$alln $alls` + case $TARGET_OS in + OS/390) lfcr='\n\r' ;; # EBCDIC goofiness + *) lfcr='\012\015' ;; + esac + hv=`echo $hf | tr -d $lfcr | tr -c $alll$allu$alln $alls` echo /* NeXTstep bug workaround */ x for i do @@ -577,7 +581,7 @@ echo $me: Error: ./$tfn is a directory! 2 exit 1 fi -rmf a.exe* a.out* conftest.c *core core.* lft ${tfn}* no *.bc *.ll *.o *.gen \ +rmf a.exe* a.out* conftest.c *core core.* lft ${tfn}* no *.bc *.dbg *.ll *.o *.gen \ Rebuild.sh signames.inc test.sh x vv.out SRCS=lalloc.c eval.c exec.c expr.c funcs.c histrap.c jobs.c @@ -829,6 +833,12 @@ OpenBSD) : ${HAVE_SETLOCALE_CTYPE=0} ;; +OS/390) + SIZE=: # not available + add_cppflags -DNSIG=32 + add_cppflags -D_ALL_SOURCE + oswarn=; EBCDIC support is incomplete + ;; OSF1) HAVE_SIG_T=0 # incompatible add_cppflags -D_OSF_SOURCE @@ -929,6 +939,7 @@ : ${AWK=awk} ${CC=cc} ${NROFF=nroff} ${SIZE=size} test 0 = $r echo | $NROFF -v 21 | grep GNU /dev/null 21 \ + echo | $NROFF -c /dev/null 21 \ NROFF=$NROFF -c # this aids me in tracing FTBFSen without access to the buildd @@ -1327,8 +1338,16 @@ DOWARN=-Wc,-we ;; xlc) - save_NOWARN=-qflag=i:e - DOWARN=-qflag=i:i + case $TARGET_OS in + OS/390) + save_NOWARN=-qflag=e + DOWARN=-qflag=i + ;; + *) + save_NOWARN=-qflag=i:e + DOWARN=-qflag=i:i + ;; + esac ;; *) test x$save_NOWARN = x save_NOWARN=-Wno-error @@ -1493,10 +1512,25 @@ ac_flags 1 extansi -Xa ;; xlc) - ac_flags 1 rodata -qro -qroconst -qroptr - ac_flags 1 rtcheck -qcheck=all - #ac_flags 1 rtchkc -qextchk # reported broken - ac_flags 1 wformat -qformat=all -qformat=nozln + case $TARGET_OS in + OS/390) + # On IBM z/OS, the following are warnings by default + # CCN3296: #include file foo.h not found. + # CCN3944: Attribute __foo__ is not supported and is ignored. + # CCN3963: The attribute foo is not a valid variable + # attribute and is ignored. + ac_flags 1 halton -qhaltonmsg=CCN3296 -qhaltonmsg=CCN3944 -qhaltonmsg=CCN3963 + # CCN3290: Unknown macro name FOO on #undef directive. + # CCN4108: The use of keyword '__attribute__' is non-portable. + ac_flags 1 supprss -qsuppress=CCN3290 -qsuppress=CCN4108 + ;; + *) + ac_flags 1 rodata -qro -qroconst -qroptr + ac_flags 1 rtcheck -qcheck=all + #ac_flags 1 rtchkc -qextchk # reported broken + ac_flags 1 wformat -qformat=all -qformat=nozln + ;; + esac #ac_flags 1 wp64 -qwarn64 # too verbose for now ;; esac @@ -2628,8 +2662,8 @@ MKSH_ASSUME_UTF8 (0=disabled, 1=enabled; default: unset) MKSH_BINSHPOSIX if */sh or */-sh, enable set -o posix MKSH_BINSHREDUCED if */sh or */-sh, enable set -o sh -MKSH_CLRTOEOL_STRING \033[K -MKSH_CLS_STRING \033[;H\033[J +MKSH_CLRTOEOL_STRING \033[K (replace \033 with \047 on EBCDIC) +MKSH_CLS_STRING \033[;H\033[J (likewise) MKSH_CONSERVATIVE_FDS fd 0-9 for scripts, shell only up to 31 MKSH_DEFAULT_EXECSHELL /bin/sh (do not change) MKSH_DEFAULT_PROFILEDIR /etc (do not change) Index: check.pl === RCS file: /cvs/src/bin/mksh/check.pl,v retrieving revision 1.38 diff -u -r1.38 check.pl --- check.pl 8 Mar 2015 22:54:55 - 1.38 +++ check.pl 24 Apr 2015 02:12:22 - @@ -1165,7 +1165,7 @@ print STDERR $prog:$test{':long-name'}: expected-exit value $val not in 0..255\n; return undef; } - } elsif ($val !~ /^([\s+-=*%\/|!()]|\b[wse]\b|\bSIG[A-Z][A-Z0-9]*\b)+$/) { + } elsif ($val !~ /^([\s\d+-=*%\/|!()]|\b[wse]\b|\bSIG[A-Z][A-Z0-9]*\b)+$/) { print STDERR $prog:$test{':long-name'}: bad expected-exit expression: $val\n; return undef; } Index: check.t
Re: [PATCH] IBM z/OS + EBCDIC support
On Fri, 2015 Apr 24 14:29+, Thorsten Glaser wrote: Woah! I’m amazed, flattered, impressed, puzzled, wondering, etc. at the same time. Please give me a bit to come up with a suitable reply, this deserves some time to think about, and I love your enthusiasm. Oh, you're very kind! mksh already has a remarkable track record of portability, and this would be yet another feather in its cap. The build system, though unconventional, turned out to be a lot easier to work with in the EBCDIC environment than GNU Bash's. By the way, there's one addendum I'd like to put here: It turns out that NSIG=32 isn't quite right for z/OS. The system has a few more signals than that, some of which appear to be unique: $ kill -l NULL HUP INT ABRT ILL POLL URG STOP FPE KILL BUS SEGV SYS PIPE ALRM TERM USR1 USR2 ABND CONT CHLD TTIN TTOU IO QUIT TSTP TRAP IOERR WINCH XCPU XFSZ VTALRM PROF DANGER TRACE DCE DUMP $ kill -l | tr ' ' '\n' | grep . | wc -l 37 $ grep SIGDUMP /usr/include/signal.h #define SIGDUMP 39 (SIGDANGER, Will Robinson!) Do take the time you need to chew through all those changes, of course. I'll be happy to pick things up again at your convenience. My ASCII-art .sig got a bad case of Times New Roman. My condolence! Rest assured I’m using FixedMisc [MirOS]¹ here. ① https://www.mirbsd.org/MirOS/dist/mir/Foundry/FixedMisc-20130517.tgz Nice! If it weren't for the big Web-mail providers seeing fit to display .signatures in variable-width fonts, there would still be a little ASCII skunk down below ^_^ --Daniel -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman.
Re: [PATCH] IBM z/OS + EBCDIC support
On Wed, 2015 May 6 20:22+, Thorsten Glaser wrote: Daniel Richard G. dixit: Unless we convert EBCDIC to Unicode ourselves (as opposed to letting the system do it; I’m currently convinced that we really want to do this actually, since we don’t support them all anyway). If you bundle a set of encoding tables with mksh (whether transformed into C arrays or loaded as-is at runtime), you can easily support every variant of EBCDIC that matters. Just look at iconv -l on Linux, for example; it's not like this would be a hard problem. If all of mksh's input/output is being filtered via conversion tables It isn’t, it never is. That would just be insane ;-) Only some. Well, filtering everything would sure make some interesting things possible. (Maybe it's feasible, if you plan it that way from the start. Use strerror()+convert instead of perror(), and so on. Food for thought ;) I think that sounds right. Maybe call the binary emksh? As much as Okay, emksh and EBCDIC [ML]KSH it is. Sounds good! Down the line, if mksh ever gains features that are particular to z/OS--- like being able to interface with parts of the system that are outside of the OMVS Unix environment---then this may be something to revisit. But as long as it's just using the normal POSIX interface, distinguishing it by the use of EBCDIC is the right way to go, IMO. So, everything seems in order here. I see you've merged in many of the changes already. Is there anything more you need from me at this time? I'll be happy to test a pristine tree on z/OS once all the necessary tweaks are in. I did want to pass one thing along for now, amending my original patch. It turns out that xlc on z/OS does in fact support -qro and -qroconst; it's only -qroptr that is unsupported. Small oversight on my part. Also, while this xlc doesn't have -qcheck, it does have -qrtcheck: -qrtcheck[=option] | -qnortcheck Generates compare-and-trap instructions that perform certain types of runtime checking. The available options are: all Automatically generates compare-and-trap instructions for all possible runtime checks. bounds Performs runtime checking of addresses when subscripting within an object of known size. divzero Performs runtime checking of integer division. nullpr Performs runtime checking of addresses contained in pointer variables used to reference storage. The default is -qnortcheck. That seems to be in the same spirit, so I threw it in. Your call if you'd like to use it, of course. Very nice hack! I do prefer analog clocks myself. ;-) I don’t care as long as the clock also shows the day and month, and preferably also the day-of-week. I tend to not know them. Oh, and, i̲f̲ I have a clock, it better go right, so it should do NTP or DCF77 (sadly, most DCF77 clocks only sync once a day or month or when you trigger it manually, not constantly, so they are off most of the time, especially when they are often in buildings that shield radio signals well). It's certainly easy enough to get WWVB radio clocks here in the States, though if you live in the edges of the continent, you'll have a hard time getting the signal. I like date information too, but good luck finding that in an analog model bigger than a wristwatch! --Daniel -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman. Index: Build.sh === RCS file: /cvs/src/bin/mksh/Build.sh,v retrieving revision 1.678 diff -u -r1.678 Build.sh --- Build.sh 29 Apr 2015 20:44:55 - 1.678 +++ Build.sh 8 May 2015 02:34:44 - @@ -1493,10 +1513,27 @@ ac_flags 1 extansi -Xa ;; xlc) - ac_flags 1 rodata -qro -qroconst -qroptr - ac_flags 1 rtcheck -qcheck=all - #ac_flags 1 rtchkc -qextchk # reported broken - ac_flags 1 wformat -qformat=all -qformat=nozln + case $TARGET_OS in + OS/390) + # On IBM z/OS, the following are warnings by default + # CCN3296: #include file foo.h not found. + # CCN3944: Attribute __foo__ is not supported and is ignored. + # CCN3963: The attribute foo is not a valid variable + # attribute and is ignored. + ac_flags 1 halton -qhaltonmsg=CCN3296 -qhaltonmsg=CCN3944 -qhaltonmsg=CCN3963 + # CCN3290: Unknown macro name FOO on #undef directive. + # CCN4108: The use of keyword '__attribute__' is non-portable. + ac_flags 1 supprss -qsuppress=CCN3290 -qsuppress=CCN4108 + ac_flags 1 rtcheck -qrtcheck=all + ;; + *) + ac_flags 1 roptr -qroptr + ac_flags 1 rtcheck -qcheck=all + #ac_flags 1 rtchkc -qextchk # reported broken + ac_flags 1 wformat -qformat=all -qformat=nozln + ;; + esac + ac_flags 1 rodata -qro -qroconst #ac_flags 1 wp64 -qwarn64 # too verbose
Re: [PATCH] IBM z/OS + EBCDIC support
Hi Thorsten, apologies for the delay. On Thu, 2017 Apr 20 21:49+, Thorsten Glaser wrote: > > >Interesting! So POSIX assumes ASCII, to a certain extent. > > Yes, it does. I think EBCDIC as charset is actually nonconformant, but > it probably pays off to stay close nevertheless. (This is actually > about the POSIX/'C' locale; other locales can pretty much do whatever > they want.) Ah, okay, C locale; that makes sense. I did imagine POSIX was largely agnostic about the character set. > >Even if you really do need a table, you could populate it on startup > >using these. > > Indeed… but we have the compile-time translated characters all over > the source (I think we agreed earlier that not supporting changing it > at runtime was okay). Oh, so you mean like if(c=='[') and such? That is certainly reasonable. The program would be tied to the compile-time codepage no worse than most other programs. (If you could do everything in terms of character literals, without depending on constructs like if(c>='A'&<='Z'), your code would be pretty much EBCDIC-proof.) > >Anyway, if you need any z/OS testing, feel free to drop me a line ;) > > Thanks! > > I hope to be able to get back to that offer eventually. Glad to know > you’re still interested after two years. Mainframes are not a platform for the impatient... at least not if one has to deal with IBM ^_^ On Fri, 2017 Apr 21 20:20+, Thorsten Glaser wrote: > Daniel Richard G. dixit: > > >Anyway, if you need any z/OS testing, feel free to drop me a line ;) > > main() { printf("%02X\n", '\n'); return 0; } > > Out of curiosity, what does that print on your systems, 15 or 25? $ cat >test.c main() { printf("%02X\n", '\n'); return 0; } $ xlc -o test test.c $ ./test 15 However... $ cat >test2.c #pragma convert("ISO8859-1") int c = '\n'; #pragma convert(pop) main() { printf("%02X\n", c); return 0; } $ xlc -o test2 test2.c $ ./test2 0A That may or may not be useful. Of course, the pragma would need to be protected by #if defined(__MVS__) && defined(__IBMC__) Gnulib uses this in its test-iconv.c program, because the string literals therein need to be in ASCII regardless of platform. > Also, what line endings do the auto-converted source files, such > as dot.mkshrc, have? $ head -2 dot.mkshrc # $Id$ # $MirOS: src/bin/mksh/dot.mkshrc,v 1.101 2015/07/18 23:03:24 tg Exp $ $ head -2 dot.mkshrc | od -t x1 007B 40 5B C9 84 5B 15 7B 40 5B D4 89 99 D6 E2 7A 2040 A2 99 83 61 82 89 95 61 94 92 A2 88 61 84 96 40A3 4B 94 92 A2 88 99 83 6B A5 40 F1 4B F1 F0 F1 6040 F2 F0 F1 F5 61 F0 F7 61 F1 F8 40 F2 F3 7A F0 000100F3 7A F2 F4 40 A3 87 40 C5 A7 97 40 5B 15 000116 (Yes, binary files do get messed up :-] On z/OS-native filesystems, there is a per-file type flag that enables or disables encoding auto- conversion. For NFS mounts, you have to mount it as either "binary" or "text." The mksh source tree above is on the latter sort of mount.) Let me know if I can help any more! --Daniel -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman.
Re: [PATCH] IBM z/OS + EBCDIC support
On Sat, 2017 Apr 22 23:26+, Thorsten Glaser wrote: > > >Oh, so you mean like if(c=='[') and such? That is certainly > >reasonable. The program would be tied to the compile-time codepage no > >worse than most other programs. > > Right. So either something like -DMKSH_EBCDIC_CP=1047 or limiting > EBCDIC support to precisely one codepage. I don't think the former sort of directive should be necessary. There is enough auto-conversion magic going on that it should be possible to piggyback on that... where it all "just works" when you compile the code. > >(If you could do everything in terms of character literals, without > >depending on constructs like if(c>='A'&<='Z'), your code would be > >pretty much EBCDIC-proof.) > > Yesss… but… > > ① not all characters are in every codepage, and True, but ASCII should be a given. (There are some older EBCDIC codepages that lack certain common characters, I forget which ones, but no one will want to use those anyway.) > ② I need strictly monotonous ordering for all 256 possible octets > for e.g. sorting strings in some cases and for [a-z] ranges That sounds no worse than what is usually done for LC_COLLATE and such... > OK, I can live with that, so I just need to swap the conversion tables > I got (which map 15 to NEL and 25 to LF). Always thought it was funny that it's the weirdo mainframe platform that has a proper "newline" character instead of pressing LF into service as one ^_^ > >#pragma convert("ISO8859-1") > […] > >That may or may not be useful. Of course, the pragma would need to be > > Interesting, but I can’t think of where that would be useful at the > moment. But good to know. > > Hmm. Can this be used to construct the table? > > Something like running this at configure time: > > main() { > int i = 1; > > printf("#pragma convert(\"ISO8859-1\")\n"); > printf("static const unsigned char map[] = \""); > while (i <= 255) > printf("%c", i++); > printf("\";\n"); > } > > And then feed its output into the compiling, and have > some code generating the reverse map like: > > i = 0; > while (i < 255) > revmap[map[i]] = i + 1; > > But this reeks of fragility compared with supporting a known-good hand- > edited set of codepages. Probably easier just to use etoa(), or atoe()? I don't think explicit hand-edited tables should be needed for EBCDIC, unless you're already doing those for other encodings. > (Not to say we can’t do this manually once in order to actually _get_ > those mappings.) Certainly the above code would either need some tweaking, or the output some massaging, so the odd characters (especially '"') don't throw off the compiler. > >Let me know if I can help any more! > > Okay, sure, thanks. I must admit I’m not actively working on this > still but I’m considering making a separate branch on which we can try > things until they work, then merge it back. I'm happy to test iterations of this, as long as it doesn't need much diagnosing... > But first, the character class changes themselves. That turned out to > be quite a bit more effort than I had estimated and will keep me busy > for another longish hacking session. Ugh. Oh well. But on the plus > side, this will make support much nicer as *all* constructs like “(c > >= '0' && c <= '9')” will go away and even the OS/2 TEXTMODE line > endings (where CR+LF is also supported) need less cpp hackery. Sounds great! That'll certainly make EBCDIC easier to deal with. I might suggest looking at Gnulib, specifically lib/c-ctype.h, for inspiration. I helped them get their ctype implementation in order on z/OS (and at one point we were even trying to deal with *signed* EBCDIC chars, where 'A' has a negative value!), and it works solidly now. They've got a good design for dealing with non-ASCII weirdness; they were clearly thinking of that from the start. Happy hacking, --Daniel -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman.
Re: [PATCH] IBM z/OS + EBCDIC support
On Tue, 2017 Apr 25 10:59+, Thorsten Glaser wrote: > Well, the hand-edited tables would be known to be stable and > (somewhat) correct, but… > > >Even if you really do need a table, you could populate it on startup > >using these. > > I guess I can probably work with that. > > So we’re up for testing again! > [...] I had to get rid of , and replace err() with printf(). But otherwise, here is the result: 00 01 02 03 9C 09 86 7F 97 8D 8E 0B 0C 0D 0E 0F 10 11 12 13 9D 0A 08 87 18 19 92 8F 1C 1D 1E 1F 80 81 82 83 84 85 17 1B 88 89 8A 8B 8C 05 06 07 90 91 16 93 94 95 96 04 98 99 9A 9B 14 15 9E 1A 20 A0 E2 E4 E0 E1 E3 E5 E7 F1 A2 2E 3C 28 2B 7C 26 E9 EA EB E8 ED EE EF EC DF 21 24 2A 29 3B 5E 2D 2F C2 C4 C0 C1 C3 C5 C7 D1 A6 2C 25 5F 3E 3F F8 C9 CA CB C8 CD CE CF CC 60 3A 23 40 27 3D 22 D8 61 62 63 64 65 66 67 68 69 AB BB F0 FD FE B1 B0 6A 6B 6C 6D 6E 6F 70 71 72 AA BA E6 B8 C6 A4 B5 7E 73 74 75 76 77 78 79 7A A1 BF D0 5B DE AE AC A3 A5 B7 A9 A7 B6 BC BD BE DD A8 AF 5D B4 D7 7B 41 42 43 44 45 46 47 48 49 AD F4 F6 F2 F3 F5 7D 4A 4B 4C 4D 4E 4F 50 51 52 B9 FB FC F9 FA FF 5C F7 53 54 55 56 57 58 59 5A B2 D4 D6 D2 D3 D5 30 31 32 33 34 35 36 37 38 39 B3 DB DC D9 DA 9F > Can you run this in both codepages, and possibly their Euro > equivalents? I'm afraid I'm not able to switch the codepage. Some searching indicates that this can be done in a shell with e.g. LANG=En_us.IBM-037 LC_ALL=En_us.IBM-037 but that doesn't affect the output of your program. It's possible that this needs to be set outside the z/OS Unix environment, in the actual mainframe UI, and that eludes even me :> You don't have enough confidence in etoa_l() to generate the table at build time? > There’s no EBCDIC to Unicode function (ideal would be something that > gets a char and returns an int or something, not on buffers) though, > is there? (If there is, runs of that would also be welcome.) I don’t > find one in the IBM library reference, and I had a look at z/OS > Unicode Services but… there’s CUNLCNV, but it looks extremely… IBM. So > maybe we can or have to make do with etoa and its limitations… > probably still enough at this point. Don't forget that ISO 8859-1 is equivalent to the first 256 codepoints of Unicode ;) --Daniel -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman.
Re: mksh on EBCDIC, testing
On Mon, 2017 May 1 16:37+, Thorsten Glaser wrote: > > >The credit is good, thank you! For changes like this I'd request mention > >in an AUTHORS file or the like, but as mksh doesn't have one, that's a > >moot point. > > Well the sum of all people listed in (c) plus some in the manpage, but… Wherever is appropriate :) > >Could you elaborate on which parts need updating? While I know more now > >than I did then, the comment didn't discuss conversion to/from an "ASCII > >codepage." > > I’m not too clear on that myself. I guess you could just check > what’s still missing, patch-wise, and resubmit that, or something. > Or not, if we’re good. I would change "thankfully tend to agree on the code points..." to "usually tend to agree..." in light of the square-bracket shenanigans, but that's about it. > >The first command shows > > > >==> which compiler seems to be used... xlc > > > >but uses "cc" for everything, which doesn't work. (On z/OS, "cc" is a > >K compiler, only used for building ancient code.) > > This is expected, this is a detection of the *type* of compiler. > I’ll change the verbose string to “which compiler type seems…”. Note that "cc" isn't even the same type of compiler as xlc. The option syntax is different/incompatible, for starters. I saw that you put in a change to use xlc on this platform, however, so this should no longer be an issue. > It’s used later on, but it could conceivably be optional. > My perl-foo is not very good either though… I’ll try something. I remember seeing somewhere that it's possible to make a "use" fail non- fatally. I think an eval was involved... > Now on to the tests themselves: > [...] > > Conclusion: I guess we’re not quite there yet, but I now have more > input to work with once I have the more pressing issues out of my > feet. Thanks! Glad to help! Let me know if you want another go. > As for ebcdic-soecial: the testsuite already contains a “duffs-device-faux- > EBCDIC” check which is used in “faux EBCDIC” mode (basically ASCII > system running with most of the EBCDIC codepaths enabled), just so you > can see how they can differ; I’m fairly sure the “got” output for duffs- > device in your testsuite log can be copy/pasted into a “duffs-device- > EBCDIC” expected-stdout, as it looks correct. So you can play around > with the tests if you want (otherwise I’ll just do them in batch, > after fixing the more obvious bugs like why the hell there are ^L and > ^G in the shell output). The test suite is pretty much Greek to me :] But looking at the output of that test, and the raw bytes therein, I would point out that the file did go through an EBCDIC->ASCII conversion when I copied it out of the system. If I view the file in less(1) in z/OS, I see e.g. \072\073\074\075\076\077 <41><42><43><44><45><46><47><48><49><4A>.<(+|&<51><52> instead of the \072\073\074\075\076\077 .<(+|& that you are probably seeing. (You may be taking this into account already, but I wanted to make sure.) --Daniel -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman.
Re: \uXXXX on EBCDIC systems (was Re: [PATCH] IBM z/OS + EBCDIC support)
Hi Thorsten, On Wed, 2017 May 3 15:57+, Thorsten Glaser wrote: > Dixi quod… > > >Use U+4DC0 HEXAGRAM FOR THE CREATIVE HEAVEN (䷀) then ☺ > > I *do* have a follow-up question for that now. > > The utf8bug-1 test fails because its output is interpreted as UTF-8, > but the UTF-8 string it should match was treated as “extended ASCII” > and is thus converted… > > So, the situation as it is right now is: > > print -n '0\u4DC0' outputs the following octets: > - on an ASCII system : 30 E4 B7 80 > - on an EBCDIC system: F0 E4 B7 80 > > That is, “0” is output in the native codepage, and the Unicode > value is output as real UTF-8 octets. This kind of weirdness is but one reason why z/Linux (Linux on z/OS) is eating Unix System Services alive :] > Now you say UTF-8 is not really used on z/OS or EBCDIC systems > in general, so I was considering the following heresy: > - output: F0 43 B3 20 > > That is, convert UTF-8 output, before actually outputting it, > as if it were “extended ASCII”, to EBCDIC. > > Converting F0 43 B3 20 from EBCDIC(1047) to “extended ASCII” > yields 30 E4 B7 80 by the way, see above. (Typos in the manual > conversion notwithstanding.) > > This would allow more consistency doing all those conversions > (which are done automatically). If it doesn’t diminish the > usefulness of mksh on EBCDIC systems I’d say go for it. > > Comments? While UTF-8 isn't a thing in the z/OS environment, I think there could be value in printing something that will be converted by the existing EBCDIC->ASCII terminal/NFS conversion into correctly-formed UTF-8 characters. To wit: Say I have a UTF-8-encoded file in NFS, and I view it via a text-mode NFS mount on z/OS. If I view it in less(1), then the high characters are shown as arbitrary byte sequences (e.g. "DIVISION SIGN" is "<66>"). But if I just "cat" the file, then it renders correctly in the terminal. Effectively an ASCII->EBCDIC->ASCII round trip. I don't know if there are use cases where this may yield unintuitive results... perhaps if this "nega-UTF-8" were redirected to a file and then processed further in z/OS, that may lead to some surprises. But in terms of doing something sensible when using a "\u" escape in an environment that shouldn't support it, it seems no worse than producing actual UTF-8 bytes. --Daniel -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman.
Re: mksh on EBCDIC, testing
Hi Thorsten, On Wed, 2017 May 3 22:38+, Thorsten Glaser wrote: > > >Glad to help! Let me know if you want another go. > > OK, please do so… I’ll mail you a tarball preview version. Terminal output files from the mksh-pre tree, with the changes you requested, are attached. > >of that test, and the raw bytes therein, I would point out that the > >file did go through an EBCDIC->ASCII conversion when I copied it > >out of the > > Yes, I expected that. I tried to fix a fair amount of the testcases > already… well, let’s see what to make of it. Unexpectedly, I also > managed to fix with not too much effort a case of where it tried to > use 0x80 as flag, applied to things like '0', which of course went > wrong (and was the likely cause for most of the ^G and ^L in the > testsuite output). On this May the 4th, a particular Yoda quote seems applicable: "You must unlearn, what you have learned." :-) --Daniel -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman. mksh-build.txt.gz Description: application/gzip mksh-test.txt.gz Description: application/gzip