This is unrelated to cjk. but the behavior change in beta that it assumes one byte literals are always utf8 encoded which I think is wrong.
Чт, 07 июл 2016, robert therriault написал(а): > Howdy Bill :-) > > I agree that there will be cases that will not be handled by the approach > that I am recommending. When the display width is larger than 3 (my proposal > would have a max width of 3 because I believe that was what you had said was > the maximum UTF-8 code point for J in an earlier post) or the display height > is larger than 1 we will be back in the same situation that we were before > with boxing being erratic, but this will happen far less than it does > currently. If we vary the width of the characters for any other reason, such > as treating CJK characters differently from other UTF-8 characters with code > point of 3, we run the risk of really complicating the rules of how the > display is created. Since the mail clients will probably mangle the results, > try the following examples with 805 beta-9. > > 3 5 $ ": (u: 30005 30006 30007 30008 30009) > > < 3 5 $ ": (u: 30005 30006 30007 30008 30009) > > 3 5 $ ": (u: 30005 30006 3101 30008 30009) > > < 3 5 $ ": (u: 30005 30006 3101 30008 30009) > > Cheers, bob > > > On Jul 7, 2016, at 3:54 PM, bill lam <bbill....@gmail.com> wrote: > > > > As you and Eric had said, unicode is a bit of the wild west. The length of > > utf8 is 1 to 4, but the display width can be larger than 4, also display > > height larger than 1. I think there are always some cases which cannot be > > handled by a simplistic approach. > > On Jul 7, 2016 11:31 PM, "robert therriault" <bobtherria...@mac.com> wrote: > > > >> Thinking about this a little more I wonder if spacing should be done on > >> the basis of UTF-8 encoding because this is really the source of the issue > >> with boxes not lining up with contents. > >> > >> '电甶男甸甹' > >> 电甶男甸甹 > >> $ '电甶男甸甹' NB. 5 characters represented by 15 integers > >> 15 > >> < '电甶男甸甹' > >> ┌───────────────┐ > >> │电甶男甸甹│ > >> └───────────────┘ > >> ": < '电甶男甸甹' > >> ┌───────────────┐ > >> │电甶男甸甹│ NB. This line is narrower because the characters do not > >> require the space allotted by the width of the UTF-8 encoding array > >> └───────────────┘ > >> $ ": < '电甶男甸甹' > >> 3 17 > >> > >> JVERSION > >> Engine: j804/j64/darwin > >> Release: commercial/2015-12-21 18:06:25 > >> Library: 8.04.15 > >> Qt IDE: 1.4.10/5.4.2 > >> Platform: Darwin 64 > >> Installer: J804 install > >> InstallPath: /applications/j64-804 > >> Contact: www.jsoftware.com > >> > >> The width of the box is being determined by the number of integers in the > >> UTF-8 encoding, but the actual width of the characters results in the > >> narrowing of the box on the lines where characters are actually displayed. > >> If UTF-8 spacing were used then there would be enough room for wider > >> characters and the boxes would line up not only for unicode characters, but > >> also for literals expressed in UTF-8. Having the CJK characters only taking > >> up two spaces instead of the expected three reduces the meaningfulness of > >> the display of UTF-8 literals. Using UTF-8 spacing would mean that there > >> would be extra space within the box, but the boxes would still line up with > >> each other based on the fact that they are all using the spacing created by > >> UTF-8. > >> > >> > >> My 2 bits. > >> > >> Cheers, bob > >> > >>> On Jul 7, 2016, at 7:57 AM, robert therriault <bobtherria...@mac.com> > >> wrote: > >>> > >>> Yes, unicode is a bit of the wild west, isn't it? > >>> > >>> The thing I am a bit concerned about is whether having some wide > >> characters line up and others that don't creates even more wildness. In > >> terms of the things that I have been doing with svg and html, this spacing > >> does throw my viewer out of whack compared to 804, but I think that I can > >> compensate as long as I know which characters will be affected and which > >> ones won't.I am guessing that the CJK characters are being identified > >> through their range of encoding values? > >>> > >>> Cheers, bob > >>> > >>> > >>>> On Jul 7, 2016, at 7:11 AM, Eric Iverson <eric.b.iver...@gmail.com> > >> wrote: > >>>> > >>>> Unicode is a wonderful beast, but it is a wild one. > >>> > >>> ---------------------------------------------------------------------- > >>> For information about J forums see http://www.jsoftware.com/forums.htm > >> > >> ---------------------------------------------------------------------- > >> For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm -- regards, ==================================================== GPG key 1024D/4434BAB3 2008-08-24 gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm