Re: [Jsource] Unicode identifiers

'Pascal Jasmin' via Source Sat, 09 Jul 2016 10:00:24 -0700

bug fix,

u2j =: ;@:(('U',~ 'U', 3 ":@u: ])`(])@.((128&> *. e.&(95 , (48 +i.10) , 65 97 
,@:(+/) i.26)  +. 64&<)@:(3&u:))each)


 u2j u: 'asdf_wer123' , '_234fds' ,~ u: 744
asdf_wer123U744U_234fds

 j2u u2j u: 'asdf_wer123' , '_234fds' ,~ u: 744
asdf_wer123˨_234fds



j2u u2j u: 'asdf_wer123UUU12x' , '_234fds' ,~ u: 744
asdf_wer123UUU12x˨_234fds

Another use for this is to implement a binary key dictionary within a locale 
(where key =: value).  All binary ascii codes get turned into a legal J name, 
though there is collision potential with U[numbers]U occurring in the binary 
key space.


----- Original Message -----
From: 'Pascal Jasmin' via Source <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Friday, July 8, 2016 1:26 PM
Subject: Re: [Jsource] Unicode identifiers

j2u =:(4 u: 0 -.~ (3 u: ]) amV reduce~ ] (  ] ,~"1  ((0 #~ #@]) ,@:{:@:(,: ,.) 
".@:}:@:}.@:{~) leaf)  'U[0123456789]+U' <@(+ i.)/"1@:rxmatches ]) 


u2j =: ;@:(('U',~ 'U', 3 ":@u: ])`(])@.((128&> *. e.&(48 +i.10)  +. 
64&<)@:(3&u:))each)

note a small problem with both j804 and j805.  These functions only work with 
what u: understands the string to be.

  u: '*12+ÃÅ├'
*12+ÃÃ…â


j2u u2j  '*12+ÃÅ├'
*12+ÃÃ…â
u2j j2u u2j j2u u2j  '*12+ÃÅ├'
U42U12U43UU195UU131UU195UU133UU226UU148UU156U



j2u 'U980U3sdfU3fU44dU3122UU433U'
ϔ3sdfU3fU44dలƱ
u2j j2u 'U980U3sdfU3fU44dU3122UU433U'
U980U3sdfU3fU44dU3122UU433U


technique encodes unicode characters with U[decimal unicode]U, and allows 
punctuation and control characters as convertible variable names.


----- Original Message -----
From: Raul Miller <[email protected]>
To: Source forum <[email protected]>
Sent: Friday, July 8, 2016 10:37 AM
Subject: Re: [Jsource] Unicode identifiers

Does this mean that you are interested in writing that display tool?

If so, do you need anything documented better, to get it done?

Thanks,

-- 
Raul


On Fri, Jul 8, 2016 at 2:41 AM, Björn Helgason <[email protected]> wrote:
> I agree with you completely.
>
> What could be done is sort of have script with names used for J.
>
> Then have a rule script with those names exchanged in a toggle to unicode
> set in a table.
>
> So in J the rule is use names.
>
> For display purposes one script version is with the names and another
> script those names replaced for those who want the names displayed as
> unicode.
>
> J would not have to bother with the unicode and the user can read it in
> unicode or names as wanted/needed.
>
> Maybe only translate from names to unicode for display or toggle between
> the two displays.
>
> Just create a separate display tool for scripts.
>
> The unicode script not valid in ijx only as ijs (possibly named iju or ijsu)
> On 5 Jul 2016 21:50, "Eric Iverson" <[email protected]> wrote:
>
>> We made the decision well more than a decade ago that unicode identifiers
>> would  be a mistake. That decision was unanimous within Jsoftware at that
>> time.
>>
>> It would have been just as easy to add the support then as it is now. Has
>> anything changed that would make us reconsider?
>>
>> I can only comment for myself.
>>
>> There are 3 main reasons I am against it:
>>
>> 1. It is a fringe area and does not warrant the effort it would take - very
>> little bang for buck.
>>
>> 2. It is deceptively easy at first, but is a slippery slope. As is pretty
>> much everything with unicode. European accented letters seem like a
>> no-brainer. Then CJK. Then lots of others. Then lots of special guys. APL
>> symbols. ETC. Glyphs that look exactly the same on paper, but that are
>> different code points. This takes thought and (see 1) it just isn't
>> warranted.
>>
>> 3. Ken left us with many fundamental ideas. One was that notation is a tool
>> of thought. The correllary is that notation is a way of communication. If
>> we limit J identifiers as they currently stand then algorithms can be
>> easily and effectively be communicated around the entire world. Let in
>> unicode identifiers and this would suffer enormously.
>>
>> Unicode is for data and the support there is pretty good. It serves no
>> useful purpose in identifiers and would be a serious impediment to
>> communication.
>>
>> For historical reasons the English alphabet has a privileged position in J
>> identifiers. Perhaps this is wrong in some senses, but it is enormously
>> practical when it comes to international communications.
>>
>> I'd be very happy to not talk about this again for another 10 years.
>>
>> On Mon, Jul 4, 2016 at 3:53 PM, Jose Mario Quintana <
>> [email protected]> wrote:
>>
>> > On the one hand, Marshal asserts that Unbox allows the use of UTF-8 based
>> > identifiers in a way that "is completely backwards-compatible with
>> existing
>> > J." which I find very appealing.
>> >
>> > On the other hand, you (Jsoftware) decided strongly against it because
>> > "the  disadvantages
>> > strongly outweighed the advantages."
>> >
>> > Would you mind to elaborate on what the disadvantages are?
>> >
>> >
>> > On Sun, Jun 12, 2016 at 4:38 PM, Eric Iverson <[email protected]>
>> > wrote:
>> >
>> > > We (Jsoftware) talked about unicode identifiers quite a bit years ago
>> > when
>> > > we added uft8 and utf16 support to J. We finally decided we were
>> strongly
>> > > against it. The disadvantages strongly outweighed the advantages. I
>> don't
>> > > think anything has changed in the interim.
>> > >
>> > > I doubt unicode names will be in official Jsoftware releases for a long
>> > > time, if ever.
>> > >
>> > > On Sun, Jun 12, 2016 at 4:30 PM, Marshall Lochbaum <
>> [email protected]
>> > >
>> > > wrote:
>> > >
>> > > > Unbox has code to allow unicode identifiers in J, with the following
>> > > > rules:
>> > > >
>> > > > - All code must be UTF-8. Invalid UTF-8 causes a spelling error.
>> > > > - Any non-ASCII character is treated as alphabetic. Identifiers can
>> use
>> > > >   these characters freely.
>> > > >
>> > > > This is completely backwards-compatible with existing J, and allows
>> us
>> > > > to use things like greek characters and code in other languages:
>> > > >
>> > > >    π
>> > > > |value error: π
>> > > >    π =: 1p1
>> > > >    π
>> > > > 3.14159
>> > > >    π_1
>> > > > |value error: π_1
>> > > >
>> > > > What do people think about this? Should it be added to jsource?
>> Should
>> > > > the rules be changed for some characters?
>> > > >
>> > > > Marshall
>> > > >
>> ----------------------------------------------------------------------
>> > > > For information about J forums see
>> http://www.jsoftware.com/forums.htm


>> > > ----------------------------------------------------------------------
>> > > For information about J forums see http://www.jsoftware.com/forums.htm
>> > >
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> >
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jsource] Unicode identifiers

Reply via email to