As you and Eric had said, unicode is a bit of the wild west. The length of utf8 is 1 to 4, but the display width can be larger than 4, also display height larger than 1. I think there are always some cases which cannot be handled by a simplistic approach. On Jul 7, 2016 11:31 PM, "robert therriault" <bobtherria...@mac.com> wrote:
> Thinking about this a little more I wonder if spacing should be done on > the basis of UTF-8 encoding because this is really the source of the issue > with boxes not lining up with contents. > > '电甶男甸甹' > 电甶男甸甹 > $ '电甶男甸甹' NB. 5 characters represented by 15 integers > 15 > < '电甶男甸甹' > ┌───────────────┐ > │电甶男甸甹│ > └───────────────┘ > ": < '电甶男甸甹' > ┌───────────────┐ > │电甶男甸甹│ NB. This line is narrower because the characters do not > require the space allotted by the width of the UTF-8 encoding array > └───────────────┘ > $ ": < '电甶男甸甹' > 3 17 > > JVERSION > Engine: j804/j64/darwin > Release: commercial/2015-12-21 18:06:25 > Library: 8.04.15 > Qt IDE: 1.4.10/5.4.2 > Platform: Darwin 64 > Installer: J804 install > InstallPath: /applications/j64-804 > Contact: www.jsoftware.com > > The width of the box is being determined by the number of integers in the > UTF-8 encoding, but the actual width of the characters results in the > narrowing of the box on the lines where characters are actually displayed. > If UTF-8 spacing were used then there would be enough room for wider > characters and the boxes would line up not only for unicode characters, but > also for literals expressed in UTF-8. Having the CJK characters only taking > up two spaces instead of the expected three reduces the meaningfulness of > the display of UTF-8 literals. Using UTF-8 spacing would mean that there > would be extra space within the box, but the boxes would still line up with > each other based on the fact that they are all using the spacing created by > UTF-8. > > > My 2 bits. > > Cheers, bob > > > On Jul 7, 2016, at 7:57 AM, robert therriault <bobtherria...@mac.com> > wrote: > > > > Yes, unicode is a bit of the wild west, isn't it? > > > > The thing I am a bit concerned about is whether having some wide > characters line up and others that don't creates even more wildness. In > terms of the things that I have been doing with svg and html, this spacing > does throw my viewer out of whack compared to 804, but I think that I can > compensate as long as I know which characters will be affected and which > ones won't.I am guessing that the CJK characters are being identified > through their range of encoding values? > > > > Cheers, bob > > > > > >> On Jul 7, 2016, at 7:11 AM, Eric Iverson <eric.b.iver...@gmail.com> > wrote: > >> > >> Unicode is a wonderful beast, but it is a wild one. > > > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm