This is unrelated to cjk. but the behavior change in beta that
it assumes one byte literals are always utf8 encoded which I think
is wrong.

Чт, 07 июл 2016, robert therriault написал(а):
> Howdy Bill :-)
> 
> I agree that there will be cases that will not be handled by the approach 
> that I am recommending. When the display width is larger than 3 (my proposal 
> would have a max width of 3 because I believe that was what you had said was 
> the maximum UTF-8 code point for J in an earlier post) or the display height 
> is larger than 1 we will be back in the same situation that we were before 
> with boxing being erratic, but this will happen far less than it does 
> currently. If we vary the width of the characters for any other reason, such 
> as treating CJK characters differently from other UTF-8 characters with code 
> point of 3, we run the risk of really complicating the rules of how the 
> display is created. Since the mail clients will probably mangle the results, 
> try the following examples with 805 beta-9.
> 
>     3 5 $ ": (u: 30005 30006 30007 30008 30009)
> 
>   < 3 5 $ ": (u: 30005 30006 30007 30008 30009)
> 
>    3 5 $ ": (u: 30005 30006 3101 30008 30009)
> 
>    < 3 5 $ ": (u: 30005 30006 3101 30008 30009)
> 
> Cheers, bob
> 
> > On Jul 7, 2016, at 3:54 PM, bill lam <bbill....@gmail.com> wrote:
> > 
> > As you and Eric had said, unicode is a bit of the wild west.  The length of
> > utf8 is 1 to 4, but the display width can be larger than 4, also display
> > height larger than 1.  I think there are always some cases which cannot be
> > handled by a simplistic approach.
> > On Jul 7, 2016 11:31 PM, "robert therriault" <bobtherria...@mac.com> wrote:
> > 
> >> Thinking about this a little more I wonder if spacing should be done on
> >> the basis of UTF-8 encoding because this is really the source of the issue
> >> with boxes not lining up with contents.
> >> 
> >>    '电甶男甸甹'
> >> 电甶男甸甹
> >>   $  '电甶男甸甹' NB. 5 characters represented by 15 integers
> >> 15
> >>   < '电甶男甸甹'
> >> ┌───────────────┐
> >> │电甶男甸甹│
> >> └───────────────┘
> >>   ": < '电甶男甸甹'
> >> ┌───────────────┐
> >> │电甶男甸甹│       NB. This line is narrower because the characters do not
> >> require the space allotted by the width of the UTF-8 encoding array
> >> └───────────────┘
> >>   $ ": < '电甶男甸甹'
> >> 3 17
> >> 
> >>   JVERSION
> >> Engine: j804/j64/darwin
> >> Release: commercial/2015-12-21 18:06:25
> >> Library: 8.04.15
> >> Qt IDE: 1.4.10/5.4.2
> >> Platform: Darwin 64
> >> Installer: J804 install
> >> InstallPath: /applications/j64-804
> >> Contact: www.jsoftware.com
> >> 
> >> The width of the box is being determined by the number of integers in the
> >> UTF-8 encoding, but the actual width of the characters results in the
> >> narrowing of the box on the lines where characters are actually displayed.
> >> If UTF-8 spacing were used then there would be enough room for wider
> >> characters and the boxes would line up not only for unicode characters, but
> >> also for literals expressed in UTF-8. Having the CJK characters only taking
> >> up two spaces instead of the expected three reduces the meaningfulness of
> >> the display of UTF-8 literals. Using UTF-8 spacing would mean that there
> >> would be extra space within the box, but the boxes would still line up with
> >> each other based on the fact that they are all using the spacing created by
> >> UTF-8.
> >> 
> >> 
> >> My 2 bits.
> >> 
> >> Cheers, bob
> >> 
> >>> On Jul 7, 2016, at 7:57 AM, robert therriault <bobtherria...@mac.com>
> >> wrote:
> >>> 
> >>> Yes, unicode is a bit of the wild west, isn't it?
> >>> 
> >>> The thing I am a bit concerned about is whether having some wide
> >> characters line up and others that don't creates even more wildness. In
> >> terms of the things that I have been doing with svg and html, this spacing
> >> does throw my viewer out of whack compared to 804, but I think that I can
> >> compensate as long as I know which characters will be affected and which
> >> ones won't.I am guessing that the CJK characters are being identified
> >> through their range of encoding values?
> >>> 
> >>> Cheers, bob
> >>> 
> >>> 
> >>>> On Jul 7, 2016, at 7:11 AM, Eric Iverson <eric.b.iver...@gmail.com>
> >> wrote:
> >>>> 
> >>>> Unicode is a wonderful beast, but it is a wild one.
> >>> 
> >>> ----------------------------------------------------------------------
> >>> For information about J forums see http://www.jsoftware.com/forums.htm
> >> 
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> 
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

-- 
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to