Isn't this a matter of gui implementation? J602 uses Java libraries to implement its IDE. So isn't it a question for the vendors of Java?
They may well answer (but I'm only guessing) that when utf-8 hits a superascii that isn't part of a valid (utf-8) character, this is more likely to be a corruption than a valid use of an (obsolete) standard. Such as latin-1 (though that's not the only possibility). Therefore showing a standard glyph: � to represent corrupted data is more appropriate than misusing some obsolescent writing system to display the actual content of corrupted data. As Bill says, some users prefer to see a glyph warning of corruption rather than random glyphs masquerading as valid data. Is it not reasonable that if a string of bytes purporting to be utf-8 encoded cannot be decoded by the utf-8 algorithm, than that data is corrupt, not a throwback to an obsolete standard? To demand utf-8 to be extended to handle corrupt data as though it were latin-1 in superascii form is not a position many IT people would support. Quite apart from the fact that utf-8 algorithms are designed to be fast. It's not as if J stops you displaying a string of possible superasciis, S, in latin-1. Just use (u: S). On Sat, Mar 29, 2014 at 3:21 PM, Pascal Jasmin <[email protected]>wrote: > Your explanation is very helpful, Ian, but if I may question its accuracy > for a moment: > > utf8 displays "extended ascii" by using the 2 byte code '195 x' J will > display 195 196 {a. (or any x other 196) as 2 "illegal box tiles". (ie the > display does not escape the byte sequence as utf format). The 2 byte code > is what screws up boxed display of u: (I assume though the '? tile'/glyph > seems to cause probs too). > > The key point is that utf mode and binary mode display in J are already > different. So why not make illegal box tiles/glyphs 128 and 129 look > different from each other? To save work and thought, borrow the latin-1 > glyphs? > > > ----- Original Message ----- > From: Ian Clark <[email protected]> > To: Programming forum <[email protected]> > Cc: > Sent: Saturday, March 29, 2014 9:23:52 AM > Subject: Re: [Jprogramming] font with extended ascii? -display binary data > > Pascal says: > > J stores the full 8 bits of binary data. There very well may be a great > reason not to provide a friendly display for binary data, I just don't see > it yet. > > Yes there is, and it's to do with the nature of the utf-8 standard. Before > unicode came along, J did indeed display (255 { a.) say as a single glyph: > y-umlaut (ÿ -if that shows up on your screen!). > > There are many ways J could have "supported" unicode. In fact J uses two > distinct ways: wide-characters and utf-8. The second way, utf-8 is used by > the session window and edit window in j602 (I'm less familiar with JQt and > JHS, but I guess they do likewise), plus a lot of other popular software, > and -- most important! -- by inter-application copy/paste. It's a very > popular standard, because you can keep all your old ascii-based code -- > most of it still works. > > But there's a cost to using utf-8. As soon as your display software meets a > "superascii" byte (what you're calling "extended ascii") it needs to > interpret it as the start of a multi-byte code representing a "unicode code > point". It detects a "superascii" by the leading bit of its byte code being > 1. > > In consequence, display software can't both use utf-8 and treat a > superascii as a single glyph. So in order to use utf-8, J is forced to > abandon the ability to display (128{a.) to (255{a.) in one of the old > obsolete standards like latin-1. > > If you want to display a superascii such as (255{a.) as a single latin-1 > glyph, you can still do so (in j602 at least) like this: > > u: 255{a. > ÿ > > That works ok because the old latin-1 character set has been imported into > unicode as a proper "codespace". Raul gave you the official reference to > it: > http://www.unicode.org/charts/PDF/U0080.pdf > > It's hard to understand the ins and outs of unicode unless you call > everything by its right name, in particular using these words rigorously: > glyph, grapheme, char, codespace, code point. So few people do! > > > > http://www.jsoftware.com/jwiki/Guides/UnicodeGettingStarted#Superasciis_and_utf8-encoding > -tries to explain unicode and utf-8 in beginners' terms. But even there I > see a mistake: it occasionally uses the word "glyph" when it should be > using "grapheme". > > Still, it's a start. > > > On Fri, Mar 28, 2014 at 4:11 PM, Pascal Jasmin <[email protected] > >wrote: > > > J stores the full 8 bits of binary data. There very well may be a great > > reason not to provide a friendly display for binary data, I just don't > see > > it yet. > > > > On another note, there seems to be a case for an extra dyad form for u: . > > Say 9 u: > > > > It would behave as monad u: does for char and wchar, but for any other > > argument type (including integers) would return ] y. I understand it not > > being a priority since this can be implemented by users with 3!:0 > checking. > > > > > > ----- Original Message ----- > > From: Raul Miller <[email protected]> > > To: Programming forum <[email protected]> > > Cc: > > Sent: Friday, March 28, 2014 10:22:18 AM > > Subject: Re: [Jprogramming] font with extended ascii? -display binary > data > > > > "Normal ascii" occupies only 7 bits, so it's 128{.a. (or u: i.128). > > > > The problems created by what to do with the other half of they byte > (along > > with our love/hate relationship with standards and professionalism) have > a > > lot to do with why we are using ascii instead of ebcdic. > > > > Thanks, > > > > -- > > Raul > > > > > > > > On Fri, Mar 28, 2014 at 11:04 AM, Pascal Jasmin <[email protected] > > >wrote: > > > > > thank you Raul, > > > > > > On further thought, it appears to be impractical to use larger than > base > > > 128 for binary encoding. > > > > > > A friendlier display of my numeric list compression routine is possible > > > though u: > > > > > > BASE128 =: BASE64 , a.{~ 192 + i.64 > > > > > > > > > u: compresslistnum 1000000239482039420348x 2 248 +"1 i. 3 3 > > > bN5o8ÒÁDâïA BÀ ýA > > > bN5o8ÒÁDâïà DA þÀ > > > bN5o8ÒÁDâðÀ EÀ AÀA > > > > > > > > > > > > There is a formatting problem displaying boxed unicode data. Is there > > any > > > chance that normal ascii could display as above for codes 192+? or > boxed > > > unicode could line up? > > > > > > and > > > > > > BASE128 i. 'bN5o8ÒÁDâðÀ EÀ AÀA' > > > 27 13 57 40 60 67 128 67 128 3 67 128 67 128 67 128 128 4 67 128 128 0 > 67 > > > 128 0 > > > > > > basically show that all of the extended characters are not found in > > > BASE128 but > > > > > > a. i. 'bN5o8ÒÁDâðÀ EÀ AÀA' > > > 98 78 53 111 56 195 146 195 129 68 195 162 195 176 195 128 32 69 195 > 128 > > > 32 65 195 128 65 > > > > > > shows that 2 characters are embedded for extended chars (195 x), and > > > intermixed with single codes. > > > > > > Worth noting is that the extended characters display in my html email > > > client. > > > > > > > > > > > > > > > ----- Original Message ----- > > > From: Raul Miller <[email protected]> > > > To: Programming forum <[email protected]> > > > Cc: > > > Sent: Friday, March 28, 2014 9:07:15 AM > > > Subject: Re: [Jprogramming] font with extended ascii? -display binary > > data > > > > > > There's http://www.unicode.org/charts/PDF/U0080.pdf > > > > > > But it's not an informal page. > > > > > > 240-248 corresponds to the rightmost column (the one with the caption > > 00F), > > > and the top half of that column (00F0 through 00F8 in the small print > at > > > the bottom of each cell). > > > > > > Thanks, > > > > > > -- > > > Raul > > > > > > > > > > > > On Fri, Mar 28, 2014 at 9:48 AM, Pascal Jasmin <[email protected] > > > >wrote: > > > > > > > Jqt uses menlo as default font. Printing binary data over 127 all > > > produce > > > > identical "not found" glyphs. Is it a font issue? and is there a > > fixed > > > > width font that would display extended ascii as this list (or as much > > of > > > it > > > > as possible)? iso-latin 1? Is there some informal code page that > > shows a > > > > printable character for every (or 240-248) binary value(s)? > > > > > > > > http://www.danshort.com/ASCIImap/ > > > > > > > > > > > > A related question is wd edit will not display the prettier line > > drawing > > > > (box character set) symbols even when the font is set to Menlo. Is > > > there a > > > > workaround for that? > > > > > ---------------------------------------------------------------------- > > > > For information about J forums see > http://www.jsoftware.com/forums.htm > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > For information about J forums see http://www.jsoftware.com/forums.htm > > > > > > ---------------------------------------------------------------------- > > > For information about J forums see http://www.jsoftware.com/forums.htm > > > > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
