Isn't this a matter of gui implementation? J602 uses Java libraries to
implement its IDE. So isn't it a question for the vendors of Java?

They may well answer (but I'm only guessing) that when utf-8 hits a
superascii that isn't part of a valid (utf-8) character, this is more
likely to be a corruption than a valid use of an (obsolete) standard. Such
as latin-1 (though that's not the only possibility). Therefore showing a
standard glyph: � to represent corrupted data is more appropriate than
misusing some obsolescent writing system to display the actual content of
corrupted data. As Bill says, some users prefer to see a glyph warning of
corruption rather than random glyphs masquerading as valid data.

Is it not reasonable that if a string of bytes purporting to be utf-8
encoded cannot be decoded by the utf-8 algorithm, than that data is
corrupt, not a throwback to an obsolete standard? To demand utf-8 to be
extended to handle corrupt data as though it were latin-1 in superascii
form is not a position many IT people would support. Quite apart from the
fact that utf-8 algorithms are designed to be fast.

It's not as if J stops you displaying a string of possible superasciis, S,
in latin-1. Just use (u: S).




On Sat, Mar 29, 2014 at 3:21 PM, Pascal Jasmin <[email protected]>wrote:

> Your explanation is very helpful, Ian, but if I may question its accuracy
> for a moment:
>
> utf8 displays "extended ascii" by using the 2 byte code '195 x'  J will
> display 195 196 {a. (or any x other 196) as 2 "illegal box tiles". (ie the
> display does not escape the byte sequence as utf format). The 2 byte code
> is what screws up boxed display of u: (I assume though the '? tile'/glyph
> seems to cause probs too).
>
> The key point is that utf mode and binary mode display in J are already
> different.  So why not make illegal box tiles/glyphs 128 and 129 look
> different from each other?  To save work and thought, borrow the latin-1
> glyphs?
>
>
> ----- Original Message -----
> From: Ian Clark <[email protected]>
> To: Programming forum <[email protected]>
> Cc:
> Sent: Saturday, March 29, 2014 9:23:52 AM
> Subject: Re: [Jprogramming] font with extended ascii? -display binary data
>
> Pascal says:
> > J stores the full 8 bits of binary data.  There very well may be a great
> reason not to provide a friendly display for binary data, I just don't see
> it yet.
>
> Yes there is, and it's to do with the nature of the utf-8 standard. Before
> unicode came along, J did indeed display (255 { a.) say as a single glyph:
> y-umlaut (ÿ -if that shows up on your screen!).
>
> There are many ways J could have "supported" unicode. In fact J uses two
> distinct ways: wide-characters and utf-8. The second way, utf-8 is used by
> the session window and edit window in j602 (I'm less familiar with JQt and
> JHS, but I guess they do likewise), plus a lot of other popular software,
> and -- most important! -- by inter-application copy/paste. It's a very
> popular standard, because you can keep all your old ascii-based code --
> most of it still works.
>
> But there's a cost to using utf-8. As soon as your display software meets a
> "superascii" byte (what you're calling "extended ascii") it needs to
> interpret it as the start of a multi-byte code representing a "unicode code
> point". It detects a "superascii" by the leading bit of its byte code being
> 1.
>
> In consequence, display software can't both use utf-8 and treat a
> superascii as a single glyph. So in order to use utf-8, J is forced to
> abandon the ability to display (128{a.) to (255{a.) in one of the old
> obsolete standards like latin-1.
>
> If you want to display a superascii such as (255{a.) as a single latin-1
> glyph, you can still do so (in j602 at least) like this:
>
>    u: 255{a.
> ÿ
>
> That works ok because the old latin-1 character set has been imported into
> unicode as a proper "codespace". Raul gave you the official reference to
> it:
>   http://www.unicode.org/charts/PDF/U0080.pdf
>
> It's hard to understand the ins and outs of unicode unless you call
> everything by its right name, in particular using these words rigorously:
> glyph, grapheme, char, codespace, code point. So few people do!
>
>
>
> http://www.jsoftware.com/jwiki/Guides/UnicodeGettingStarted#Superasciis_and_utf8-encoding
> -tries to explain unicode and utf-8 in beginners' terms. But even there I
> see a mistake: it occasionally uses the word "glyph" when it should be
> using "grapheme".
>
> Still, it's a start.
>
>
> On Fri, Mar 28, 2014 at 4:11 PM, Pascal Jasmin <[email protected]
> >wrote:
>
> > J stores the full 8 bits of binary data.  There very well may be a great
> > reason not to provide a friendly display for binary data, I just don't
> see
> > it yet.
> >
> > On another note, there seems to be a case for an extra dyad form for u: .
> >  Say 9 u:
> >
> > It would behave as monad u: does for char and wchar, but for any other
> > argument type (including integers) would return ] y.  I understand it not
> > being a priority since this can be implemented by users with 3!:0
> checking.
> >
> >
> > ----- Original Message -----
> > From: Raul Miller <[email protected]>
> > To: Programming forum <[email protected]>
> > Cc:
> > Sent: Friday, March 28, 2014 10:22:18 AM
> > Subject: Re: [Jprogramming] font with extended ascii? -display binary
> data
> >
> > "Normal ascii" occupies only 7 bits, so it's 128{.a. (or u: i.128).
> >
> > The problems created by what to do with the other half of they byte
> (along
> > with our love/hate relationship with standards and professionalism) have
> a
> > lot to do with why we are using ascii instead of ebcdic.
> >
> > Thanks,
> >
> > --
> > Raul
> >
> >
> >
> > On Fri, Mar 28, 2014 at 11:04 AM, Pascal Jasmin <[email protected]
> > >wrote:
> >
> > > thank you Raul,
> > >
> > > On further thought, it appears to be impractical to use larger than
> base
> > > 128 for binary encoding.
> > >
> > > A friendlier display of my numeric list compression routine is possible
> > > though u:
> > >
> > > BASE128 =: BASE64 , a.{~ 192 + i.64
> > >
> > >
> > >    u:  compresslistnum   1000000239482039420348x 2 248 +"1 i. 3 3
> > > bN5o8ÒÁDâïA BÀ ýA
> > > bN5o8ÒÁDâïà DA þÀ
> > > bN5o8ÒÁDâðÀ EÀ AÀA
> > >
> > >
> > >
> > > There is a formatting problem displaying boxed unicode data.  Is there
> > any
> > > chance that normal ascii could display as above for codes 192+? or
> boxed
> > > unicode could line up?
> > >
> > > and
> > >
> > >   BASE128 i. 'bN5o8ÒÁDâðÀ EÀ AÀA'
> > > 27 13 57 40 60 67 128 67 128 3 67 128 67 128 67 128 128 4 67 128 128 0
> 67
> > > 128 0
> > >
> > > basically show that all of the extended characters are not found in
> > > BASE128 but
> > >
> > >    a. i. 'bN5o8ÒÁDâðÀ EÀ AÀA'
> > > 98 78 53 111 56 195 146 195 129 68 195 162 195 176 195 128 32 69 195
> 128
> > > 32 65 195 128 65
> > >
> > > shows that 2 characters are embedded for extended chars (195 x), and
> > > intermixed with single codes.
> > >
> > > Worth noting is that the extended characters display in my html email
> > > client.
> > >
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Raul Miller <[email protected]>
> > > To: Programming forum <[email protected]>
> > > Cc:
> > > Sent: Friday, March 28, 2014 9:07:15 AM
> > > Subject: Re: [Jprogramming] font with extended ascii? -display binary
> > data
> > >
> > > There's http://www.unicode.org/charts/PDF/U0080.pdf
> > >
> > > But it's not an informal page.
> > >
> > > 240-248 corresponds to the rightmost column (the one with the caption
> > 00F),
> > > and the top half of that column (00F0 through 00F8 in the small print
> at
> > > the bottom of each cell).
> > >
> > > Thanks,
> > >
> > > --
> > > Raul
> > >
> > >
> > >
> > > On Fri, Mar 28, 2014 at 9:48 AM, Pascal Jasmin <[email protected]
> > > >wrote:
> > >
> > > > Jqt uses menlo as default font.  Printing binary data over 127 all
> > > produce
> > > > identical "not found" glyphs.  Is it a font issue? and is there a
> > fixed
> > > > width font that would display extended ascii as this list (or as much
> > of
> > > it
> > > > as possible)? iso-latin 1?  Is there some informal code page that
> > shows a
> > > > printable character for every (or 240-248) binary value(s)?
> > > >
> > > > http://www.danshort.com/ASCIImap/
> > > >
> > > >
> > > > A related question is wd edit will not display the prettier line
> > drawing
> > > > (box character set) symbols even when the font is set to Menlo.  Is
> > > there a
> > > > workaround for that?
> > > >
> ----------------------------------------------------------------------
> > > > For information about J forums see
> http://www.jsoftware.com/forums.htm
>
> >
> > >
> > > >
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > >
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to