Thanks Bill,
You are right I was confusing U16 with literal2. Part of the reason for that
is that,
datatype 7 u: 3101
unicode
datatype 4 u: 3101
unicode
datatype u: 3101
unicode
I guess that there is not really a way to distinguish the fact that 7 u: 3101
returns
U16 instead of literal2 without inventing a separate J datatype. It is nice
that this
allows the 7 u: to deal with unicode4 arguments rather seamlessly.
datatype 9 u: 128512
unicode4
7 u: 128512
😀
datatype 7 u: 128512
unicode
3 u: 7 u: 128512
55357 56832
But I do wonder if since
7 u: 3101
ఝ
{. 7 u: 3101
ఝ
$ {. 7 u: 3101
# $ {. 7 u: 3101
0
Could the single non-surrogate U16 act a bit more like the ASCII cases do, or
would
that break the U16 by being non-standard?
7 u: 'a'
a
$ 7 u: 'a'
# $ 7 u: 'a'
0
Cheers, bob
> On Apr 1, 2017, at 5:54 PM, bill lam <[email protected]> wrote:
>
> the rght argument 30101 is an integer, not literal2.
>
> 7 u: returns utf16 not literal2. utf16 has surrogate pairs so that result
> must be rank-1. utf16 is not a J data type.
>
> 4 u: returns literal2 (a J data type) in which the concept of surrogate
> pairs does not apply. literal2 has atom.
>
> try 7 u: 128512 to confirm the result is a surrogate pair. Also 9 u: 128512
> is a literal4 atom.
>
> pre-j805, 7 u: integer is a domain error, behavior of j805 is incompatible.
> there will be an global parameter to restore the domain error so that it
> becomes compatible again. the same applies to 8 u: integer.
>
> Pre-j805 only support literal2.
> Utf16 was first introduced in j805. Your confusion might come from mixing
> up literal2 and utf16.
>
> On 2 Apr, 2017 12:55 am, "robert therriault" <[email protected]> wrote:
>
> u: 30101
> 疕
> datatype u: 30101
> unicode
> $ u: 30101
>
> #$ u: 30101
> 0 NB. unicode (literal2) atom as expected
>
> 4 u: 30101
> 疕
> datatype 4 u: 30101
> unicode
> $ 4 u: 30101
>
> #$ 4 u: 30101
> 0 NB. unicode (literal2) atom as expected
>
> 7 u: 30101
> 疕
> datatype 7 u: 30101
> unicode
> $ 7 u: 30101
> 1 NB. unicode (literal2) list of length 1 is unexpected
> #$ 7 u: 30101
> 1 NB. rank 1 is unexpected
>
>
> The dictionary suggests that with a right argument of literal2, then if all
> values <128, convert to ASCII, otherwise as is. [0]
> I believe that since the argument is > 128 the 'as is' case would apply and
> that no change in shape should occur, but Unicode is a tricky beast and I
> welcome enlightenment.
>
> Cheers, bob
>
> [0] http://www.jsoftware.com/help/dictionary/duco.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm