As you and Eric had said, unicode is a bit of the wild west.  The length of
utf8 is 1 to 4, but the display width can be larger than 4, also display
height larger than 1.  I think there are always some cases which cannot be
handled by a simplistic approach.
On Jul 7, 2016 11:31 PM, "robert therriault" <bobtherria...@mac.com> wrote:

> Thinking about this a little more I wonder if spacing should be done on
> the basis of UTF-8 encoding because this is really the source of the issue
> with boxes not lining up with contents.
>
>     '电甶男甸甹'
> 电甶男甸甹
>    $  '电甶男甸甹' NB. 5 characters represented by 15 integers
> 15
>    < '电甶男甸甹'
> ┌───────────────┐
> │电甶男甸甹│
> └───────────────┘
>    ": < '电甶男甸甹'
> ┌───────────────┐
> │电甶男甸甹│       NB. This line is narrower because the characters do not
> require the space allotted by the width of the UTF-8 encoding array
> └───────────────┘
>    $ ": < '电甶男甸甹'
> 3 17
>
>    JVERSION
> Engine: j804/j64/darwin
> Release: commercial/2015-12-21 18:06:25
> Library: 8.04.15
> Qt IDE: 1.4.10/5.4.2
> Platform: Darwin 64
> Installer: J804 install
> InstallPath: /applications/j64-804
> Contact: www.jsoftware.com
>
> The width of the box is being determined by the number of integers in the
> UTF-8 encoding, but the actual width of the characters results in the
> narrowing of the box on the lines where characters are actually displayed.
> If UTF-8 spacing were used then there would be enough room for wider
> characters and the boxes would line up not only for unicode characters, but
> also for literals expressed in UTF-8. Having the CJK characters only taking
> up two spaces instead of the expected three reduces the meaningfulness of
> the display of UTF-8 literals. Using UTF-8 spacing would mean that there
> would be extra space within the box, but the boxes would still line up with
> each other based on the fact that they are all using the spacing created by
> UTF-8.
>
>
> My 2 bits.
>
> Cheers, bob
>
> > On Jul 7, 2016, at 7:57 AM, robert therriault <bobtherria...@mac.com>
> wrote:
> >
> > Yes, unicode is a bit of the wild west, isn't it?
> >
> > The thing I am a bit concerned about is whether having some wide
> characters line up and others that don't creates even more wildness. In
> terms of the things that I have been doing with svg and html, this spacing
> does throw my viewer out of whack compared to 804, but I think that I can
> compensate as long as I know which characters will be affected and which
> ones won't.I am guessing that the CJK characters are being identified
> through their range of encoding values?
> >
> > Cheers, bob
> >
> >
> >> On Jul 7, 2016, at 7:11 AM, Eric Iverson <eric.b.iver...@gmail.com>
> wrote:
> >>
> >> Unicode is a wonderful beast, but it is a wild one.
> >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to