Thinking about this a little more I wonder if spacing should be done on the 
basis of UTF-8 encoding because this is really the source of the issue with 
boxes not lining up with contents. 

    '电甶男甸甹'
电甶男甸甹
   $  '电甶男甸甹' NB. 5 characters represented by 15 integers
15
   < '电甶男甸甹'
┌───────────────┐
│电甶男甸甹│
└───────────────┘
   ": < '电甶男甸甹'
┌───────────────┐
│电甶男甸甹│       NB. This line is narrower because the characters do not require 
the space allotted by the width of the UTF-8 encoding array
└───────────────┘
   $ ": < '电甶男甸甹'
3 17

   JVERSION
Engine: j804/j64/darwin
Release: commercial/2015-12-21 18:06:25
Library: 8.04.15
Qt IDE: 1.4.10/5.4.2
Platform: Darwin 64
Installer: J804 install
InstallPath: /applications/j64-804
Contact: www.jsoftware.com

The width of the box is being determined by the number of integers in the UTF-8 
encoding, but the actual width of the characters results in the narrowing of 
the box on the lines where characters are actually displayed. If UTF-8 spacing 
were used then there would be enough room for wider characters and the boxes 
would line up not only for unicode characters, but also for literals expressed 
in UTF-8. Having the CJK characters only taking up two spaces instead of the 
expected three reduces the meaningfulness of the display of UTF-8 literals. 
Using UTF-8 spacing would mean that there would be extra space within the box, 
but the boxes would still line up with each other based on the fact that they 
are all using the spacing created by UTF-8.
   

My 2 bits. 

Cheers, bob 

> On Jul 7, 2016, at 7:57 AM, robert therriault <bobtherria...@mac.com> wrote:
> 
> Yes, unicode is a bit of the wild west, isn't it? 
> 
> The thing I am a bit concerned about is whether having some wide characters 
> line up and others that don't creates even more wildness. In terms of the 
> things that I have been doing with svg and html, this spacing does throw my 
> viewer out of whack compared to 804, but I think that I can compensate as 
> long as I know which characters will be affected and which ones won't.I am 
> guessing that the CJK characters are being identified through their range of 
> encoding values? 
> 
> Cheers, bob
> 
> 
>> On Jul 7, 2016, at 7:11 AM, Eric Iverson <eric.b.iver...@gmail.com> wrote:
>> 
>> Unicode is a wonderful beast, but it is a wild one.
> 
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to