I was your good essay
(http://www.jsoftware.com/jwiki/Guides/UnicodeGettingStarted), and it turns out
that I was relying on a wrong assumption:
> utf mode and binary mode display in J are already different. J will display
>195 196 {a. (or any x other 196) as 2 "illegal box tiles".
some combinations of 195 x do display a unicode glyph even in "binary" mode
> It's not as if J stops you displaying a string of possible superasciis, S,
in latin-1. Just use (u: S).
That breaks boxes though (unicode in general does), which was the main
motivation for the proposal.
The best alternative for visualizing binary data that may be boxed is hex or
B64 encoding, though u: S without boxes looks great.
----- Original Message -----
From: Ian Clark <[email protected]>
To: Programming forum <[email protected]>
Cc:
Sent: Saturday, March 29, 2014 11:33:07 AM
Subject: Re: [Jprogramming] font with extended ascii? -display binary data
Isn't this a matter of gui implementation? J602 uses Java libraries to
implement its IDE. So isn't it a question for the vendors of Java?
They may well answer (but I'm only guessing) that when utf-8 hits a
superascii that isn't part of a valid (utf-8) character, this is more
likely to be a corruption than a valid use of an (obsolete) standard. Such
as latin-1 (though that's not the only possibility). Therefore showing a
standard glyph: � to represent corrupted data is more appropriate than
misusing some obsolescent writing system to display the actual content of
corrupted data. As Bill says, some users prefer to see a glyph warning of
corruption rather than random glyphs masquerading as valid data.
Is it not reasonable that if a string of bytes purporting to be utf-8
encoded cannot be decoded by the utf-8 algorithm, than that data is
corrupt, not a throwback to an obsolete standard? To demand utf-8 to be
extended to handle corrupt data as though it were latin-1 in superascii
form is not a position many IT people would support. Quite apart from the
fact that utf-8 algorithms are designed to be fast.
It's not as if J stops you displaying a string of possible superasciis, S,
in latin-1. Just use (u: S).
On Sat, Mar 29, 2014 at 3:21 PM, Pascal Jasmin <[email protected]>wrote:
> Your explanation is very helpful, Ian, but if I may question its accuracy
> for a moment:
>
> utf8 displays "extended ascii" by using the 2 byte code '195 x' J will
> display 195 196 {a. (or any x other 196) as 2 "illegal box tiles". (ie the
> display does not escape the byte sequence as utf format). The 2 byte code
> is what screws up boxed display of u: (I assume though the '? tile'/glyph
> seems to cause probs too).
>
> The key point is that utf mode and binary mode display in J are already
> different. So why not make illegal box tiles/glyphs 128 and 129 look
> different from each other? To save work and thought, borrow the latin-1
> glyphs?
>
>
> ----- Original Message -----
> From: Ian Clark <[email protected]>
> To: Programming forum <[email protected]>
> Cc:
> Sent: Saturday, March 29, 2014 9:23:52 AM
> Subject: Re: [Jprogramming] font with extended ascii? -display binary data
>
> Pascal says:
> > J stores the full 8 bits of binary data. There very well may be a great
> reason not to provide a friendly display for binary data, I just don't see
> it yet.
>
> Yes there is, and it's to do with the nature of the utf-8 standard. Before
> unicode came along, J did indeed display (255 { a.) say as a single glyph:
> y-umlaut (ÿ -if that shows up on your screen!).
>
> There are many ways J could have "supported" unicode. In fact J uses two
> distinct ways: wide-characters and utf-8. The second way, utf-8 is used by
> the session window and edit window in j602 (I'm less familiar with JQt and
> JHS, but I guess they do likewise), plus a lot of other popular software,
> and -- most important! -- by inter-application copy/paste. It's a very
> popular standard, because you can keep all your old ascii-based code --
> most of it still works.
>
> But there's a cost to using utf-8. As soon as your display software meets a
> "superascii" byte (what you're calling "extended ascii") it needs to
> interpret it as the start of a multi-byte code representing a "unicode code
> point". It detects a "superascii" by the leading bit of its byte code being
> 1.
>
> In consequence, display software can't both use utf-8 and treat a
> superascii as a single glyph. So in order to use utf-8, J is forced to
> abandon the ability to display (128{a.) to (255{a.) in one of the old
> obsolete standards like latin-1.
>
> If you want to display a superascii such as (255{a.) as a single latin-1
> glyph, you can still do so (in j602 at least) like this:
>
> u: 255{a.
> ÿ
>
> That works ok because the old latin-1 character set has been imported into
> unicode as a proper "codespace". Raul gave you the official reference to
> it:
> http://www.unicode.org/charts/PDF/U0080.pdf
>
> It's hard to understand the ins and outs of unicode unless you call
> everything by its right name, in particular using these words rigorously:
> glyph, grapheme, char, codespace, code point. So few people do!
>
>
>
> http://www.jsoftware.com/jwiki/Guides/UnicodeGettingStarted#Superasciis_and_utf8-encoding
> -tries to explain unicode and utf-8 in beginners' terms. But even there I
> see a mistake: it occasionally uses the word "glyph" when it should be
> using "grapheme".
>
> Still, it's a start.
>
>
> On Fri, Mar 28, 2014 at 4:11 PM, Pascal Jasmin <[email protected]
> >wrote:
>
> > J stores the full 8 bits of binary data. There very well may be a great
> > reason not to provide a friendly display for binary data, I just don't
> see
> > it yet.
> >
> > On another note, there seems to be a case for an extra dyad form for u: .
> > Say 9 u:
> >
> > It would behave as monad u: does for char and wchar, but for any other
> > argument type (including integers) would return ] y. I understand it not
> > being a priority since this can be implemented by users with 3!:0
> checking.
> >
> >
> > ----- Original Message -----
> > From: Raul Miller <[email protected]>
> > To: Programming forum <[email protected]>
> > Cc:
> > Sent: Friday, March 28, 2014 10:22:18 AM
> > Subject: Re: [Jprogramming] font with extended ascii? -display binary
> data
> >
> > "Normal ascii" occupies only 7 bits, so it's 128{.a. (or u: i.128).
> >
> > The problems created by what to do with the other half of they byte
> (along
> > with our love/hate relationship with standards and professionalism) have
> a
> > lot to do with why we are using ascii instead of ebcdic.
> >
> > Thanks,
> >
> > --
> > Raul
> >
> >
> >
> > On Fri, Mar 28, 2014 at 11:04 AM, Pascal Jasmin <[email protected]
> > >wrote:
> >
> > > thank you Raul,
> > >
> > > On further thought, it appears to be impractical to use larger than
> base
> > > 128 for binary encoding.
> > >
> > > A friendlier display of my numeric list compression routine is possible
> > > though u:
> > >
> > > BASE128 =: BASE64 , a.{~ 192 + i.64
> > >
> > >
> > > u: compresslistnum 1000000239482039420348x 2 248 +"1 i. 3 3
> > > bN5o8ÒÁDâïA BÀ ýA
> > > bN5o8ÒÁDâïà DA þÀ
> > > bN5o8ÒÁDâðÀ EÀ AÀA
> > >
> > >
> > >
> > > There is a formatting problem displaying boxed unicode data. Is there
> > any
> > > chance that normal ascii could display as above for codes 192+? or
> boxed
> > > unicode could line up?
> > >
> > > and
> > >
> > > BASE128 i. 'bN5o8ÒÁDâðÀ EÀ AÀA'
> > > 27 13 57 40 60 67 128 67 128 3 67 128 67 128 67 128 128 4 67 128 128 0
> 67
> > > 128 0
> > >
> > > basically show that all of the extended characters are not found in
> > > BASE128 but
> > >
> > > a. i. 'bN5o8ÒÁDâðÀ EÀ AÀA'
> > > 98 78 53 111 56 195 146 195 129 68 195 162 195 176 195 128 32 69 195
> 128
> > > 32 65 195 128 65
> > >
> > > shows that 2 characters are embedded for extended chars (195 x), and
> > > intermixed with single codes.
> > >
> > > Worth noting is that the extended characters display in my html email
> > > client.
> > >
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Raul Miller <[email protected]>
> > > To: Programming forum <[email protected]>
> > > Cc:
> > > Sent: Friday, March 28, 2014 9:07:15 AM
> > > Subject: Re: [Jprogramming] font with extended ascii? -display binary
> > data
> > >
> > > There's http://www.unicode.org/charts/PDF/U0080.pdf
> > >
> > > But it's not an informal page.
> > >
> > > 240-248 corresponds to the rightmost column (the one with the caption
> > 00F),
> > > and the top half of that column (00F0 through 00F8 in the small print
> at
> > > the bottom of each cell).
> > >
> > > Thanks,
> > >
> > > --
> > > Raul
> > >
> > >
> > >
> > > On Fri, Mar 28, 2014 at 9:48 AM, Pascal Jasmin <[email protected]
> > > >wrote:
> > >
> > > > Jqt uses menlo as default font. Printing binary data over 127 all
> > > produce
> > > > identical "not found" glyphs. Is it a font issue? and is there a
> > fixed
> > > > width font that would display extended ascii as this list (or as much
> > of
> > > it
> > > > as possible)? iso-latin 1? Is there some informal code page that
> > shows a
> > > > printable character for every (or 240-248) binary value(s)?
> > > >
> > > > http://www.danshort.com/ASCIImap/
> > > >
> > > >
> > > > A related question is wd edit will not display the prettier line
> > drawing
> > > > (box character set) symbols even when the font is set to Menlo. Is
> > > there a
> > > > workaround for that?
> > > >
> ----------------------------------------------------------------------
> > > > For information about J forums see
> http://www.jsoftware.com/forums.htm
>
> >
> > >
> > > >
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > >
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm