Re: [Jsource] Non displayable characters display changed in penultimate beta

bill lam Tue, 05 Jul 2016 09:53:02 -0700

I think bit boolean (BT) is not currently implmented so that it
can be swapped to a higher position, leaving the slot for C4T
(linux whar_t).  Technically this is not that difficult but each
(and all) primitives (and special codes) that currently operate 
on LIT and C2T have to be extended to C4T, and this requires 
considerable effort.


IIRC dyalog apl already supports both 2-byte and 4-byte wide
character unicode.

Вт, 05 июл 2016, Raul Miller написал(а):
> On Tue, Jul 5, 2016 at 10:56 AM, robert therriault
> <bobtherria...@mac.com> wrote:
> >  If you haven't already, take a look at the video and see what you think.
> >
> > https://youtu.be/eN9H-rMk1No
> 
> So, ok...
> 
> In the video, you are working with your display verb (v) and the
> result of this expression:
> 
>    'a',(u:600+i.5),u:30000+/i.5
> 
> But there were several things you said in the video which I should
> disagree with. I think you might be able to use a bit more background
> on these issues:
> 
> First, using the J Dictionary terminology, both of these sentences
> have literal results:
> 
>    'a',(u:600+i.5),u:30000+/i.5
> 
>    ": 'a',(u:600+i.5),u:30000+/i.5
> 
> I expect the mail system to mutilate those results, so I have removed
> them from my message here.
> 
> Second, using the Unicode Consortium's terminology, both of those
> results are Unicode. More specifically, UTF-8 is a unicode encoding.
> 
> If we look at the J type numbers for these two sentences, we get:
> 
>    3!:0 'a',(u:600+i.5),u:30000+/i.5
> 131072
>    3!:0 ": 'a',(u:600+i.5),u:30000+/i.5
> 2
> 
> The first sentence has type number 131072, which roughly corresponds
> to UTF-16. The second sentence has type number 2, which roughly
> corresponds to UTF-8. J should perhaps have a third kind of character
> literal, also, which has not yet been implemented.
> 
> So, taking a few steps back...
> 
> Unicode currently has enough characters defined that they need 21 bits
> to enumerate them all. But they also have licenses on the standard way
> of representing the characters, which has been important for some
> people and which has slowed the adoption of that aspect of the
> standard. This translates to limited font support (but there's also
> limited keyboard support as well as dubious recommendations within the
> Unicode standards themselves).
> 
> In other words, if we were to represent the full set of Unicode
> characters using type 131072, some of them would still be multi-byte
> characters. But there is little or no font support for most of them.
> 
> Also, 21 bits is about 2 million characters. So that's megabytes of
> information that would be needed mostly to support characters which
> the Unicode people have defined but which almost nobody uses (in part
> because of licensing concerns).
> 
> Hopefully that is enough background...
> 
> To round out this discussion, though, let us imagine that J had been
> extended to fully support the full 21 bit space of the Unicode
> character set.
> 
> First off, as near as I can tell, there is no font support in any
> operating system for characters which are "unicode code points which
> cannot be represented as a single code unit using utf-16". In other
> words, there's nothing for J to tie into to represent most of those
> characters. So I guess they would mostly wind up looking like any
> other unrecognized character. So if I supported them, this would be a
> "temporary hack" (which would likely survive for decades, and then be
> busy work for someone trying to keep up with some hypothetical
> changing and conflicting implementations).
> 
> Second, we would need to use one of the very few available type
> numbers to represent these characters which nobody supports. (No font
> support, no keyboard support - just compliance with some obscure
> standards which have not yet been proven useful.) And, J has only two
> unused type numbers which fit in a 32 bit implementation:
> 
> https://github.com/jsoftware/jsource/blob/master/jsrc/jtype.h#L160
> 
>    2^30x
> 1073741824
>    2^31x
> 2147483648
> 
> Though, actually, since the type is stored in a *A as an I (which is
> long on 32 bit J and long long on 64 bit J) which is a signed type on
> every major platform, there is really only one remaining unambiguous
> type number available for 32 bit J. (2147483648 would really be
> -1073741824 for current 32 bit J implementations.)
> 
> Anyways, we probably do not want to use our one remaining type number
> which can be used in 32 bit J for a character literal type which will
> probably never be supported on a 32 bit operating system.
> 
> So, let's pretend that we want to use type number 8589934592 to
> represent UTF-32 character literals, and that all of the characters
> not supported by the OS get displayed using the question mark in a
> diamond representation.
> 
> That would, at least, give us a way of representing all unicode
> characters where all unicode consortium defined characters retain
> their integrity when the array is reshaped.
> 
> But that would not solve the boxed display issue. To solve the boxed
> display issue, we need to know the character width for each character.
> J assumes a "fixed width" font, but Unicode has defined some
> characters to have double width and we don't have any fonts which make
> them the same width as anything else. The Unicode Consortium has also
> defined some characters which are zero width and some which do other
> strange things which conflict with the idea of fixed width characters.
> 
> So basically, the Unicode standard itself is the source of confusion.
> And it's approximately a bottomless pit of confusion (if you try to
> read the full set of standards available at unicode.org). But all of
> it is for good reasons, of course.
> 
> ...
> 
> Which brings me back to questions like "What do you want the user to
> understand?"
> 
> Thanks,
> 
> -- 
> Raul
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

-- 
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jsource] Non displayable characters display changed in penultimate beta

Reply via email to