And, of course, the rows of 3 5 $ ": (u: 30005 30006 3101 30008 30009)
are not valid unicode, so that means that the unicode standards aren't
going to be a lot of help.

Just saying...

-- 
Raul


On Thu, Jul 7, 2016 at 7:20 PM, robert therriault <bobtherria...@mac.com> wrote:
> Howdy Bill :-)
>
> I agree that there will be cases that will not be handled by the approach 
> that I am recommending. When the display width is larger than 3 (my proposal 
> would have a max width of 3 because I believe that was what you had said was 
> the maximum UTF-8 code point for J in an earlier post) or the display height 
> is larger than 1 we will be back in the same situation that we were before 
> with boxing being erratic, but this will happen far less than it does 
> currently. If we vary the width of the characters for any other reason, such 
> as treating CJK characters differently from other UTF-8 characters with code 
> point of 3, we run the risk of really complicating the rules of how the 
> display is created. Since the mail clients will probably mangle the results, 
> try the following examples with 805 beta-9.
>
>     3 5 $ ": (u: 30005 30006 30007 30008 30009)
>
>   < 3 5 $ ": (u: 30005 30006 30007 30008 30009)
>
>    3 5 $ ": (u: 30005 30006 3101 30008 30009)
>
>    < 3 5 $ ": (u: 30005 30006 3101 30008 30009)
>
> Cheers, bob
>
>> On Jul 7, 2016, at 3:54 PM, bill lam <bbill....@gmail.com> wrote:
>>
>> As you and Eric had said, unicode is a bit of the wild west.  The length of
>> utf8 is 1 to 4, but the display width can be larger than 4, also display
>> height larger than 1.  I think there are always some cases which cannot be
>> handled by a simplistic approach.
>> On Jul 7, 2016 11:31 PM, "robert therriault" <bobtherria...@mac.com> wrote:
>>
>>> Thinking about this a little more I wonder if spacing should be done on
>>> the basis of UTF-8 encoding because this is really the source of the issue
>>> with boxes not lining up with contents.
>>>
>>>    '电甶男甸甹'
>>> 电甶男甸甹
>>>   $  '电甶男甸甹' NB. 5 characters represented by 15 integers
>>> 15
>>>   < '电甶男甸甹'
>>> ┌───────────────┐
>>> │电甶男甸甹│
>>> └───────────────┘
>>>   ": < '电甶男甸甹'
>>> ┌───────────────┐
>>> │电甶男甸甹│       NB. This line is narrower because the characters do not
>>> require the space allotted by the width of the UTF-8 encoding array
>>> └───────────────┘
>>>   $ ": < '电甶男甸甹'
>>> 3 17
>>>
>>>   JVERSION
>>> Engine: j804/j64/darwin
>>> Release: commercial/2015-12-21 18:06:25
>>> Library: 8.04.15
>>> Qt IDE: 1.4.10/5.4.2
>>> Platform: Darwin 64
>>> Installer: J804 install
>>> InstallPath: /applications/j64-804
>>> Contact: www.jsoftware.com
>>>
>>> The width of the box is being determined by the number of integers in the
>>> UTF-8 encoding, but the actual width of the characters results in the
>>> narrowing of the box on the lines where characters are actually displayed.
>>> If UTF-8 spacing were used then there would be enough room for wider
>>> characters and the boxes would line up not only for unicode characters, but
>>> also for literals expressed in UTF-8. Having the CJK characters only taking
>>> up two spaces instead of the expected three reduces the meaningfulness of
>>> the display of UTF-8 literals. Using UTF-8 spacing would mean that there
>>> would be extra space within the box, but the boxes would still line up with
>>> each other based on the fact that they are all using the spacing created by
>>> UTF-8.
>>>
>>>
>>> My 2 bits.
>>>
>>> Cheers, bob
>>>
>>>> On Jul 7, 2016, at 7:57 AM, robert therriault <bobtherria...@mac.com>
>>> wrote:
>>>>
>>>> Yes, unicode is a bit of the wild west, isn't it?
>>>>
>>>> The thing I am a bit concerned about is whether having some wide
>>> characters line up and others that don't creates even more wildness. In
>>> terms of the things that I have been doing with svg and html, this spacing
>>> does throw my viewer out of whack compared to 804, but I think that I can
>>> compensate as long as I know which characters will be affected and which
>>> ones won't.I am guessing that the CJK characters are being identified
>>> through their range of encoding values?
>>>>
>>>> Cheers, bob
>>>>
>>>>
>>>>> On Jul 7, 2016, at 7:11 AM, Eric Iverson <eric.b.iver...@gmail.com>
>>> wrote:
>>>>>
>>>>> Unicode is a wonderful beast, but it is a wild one.
>>>>
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to