I thought it might be an fs issue, rather than an os issue. But I get the
same behaviour as you on both hfs and apfs, so apparently not.
On Wed, 23 Mar 2022, bill lam wrote:
wait. I tried on Mac and got the followings. Perhaps the newest MacOS had
changed behaviour.
c=: 195 164{a.
d=: 97 204 136{a.
c-:d
0
fread c
abc
'abc'fwrite c
3
fread c
abc
fread d
abc
'def'fwrite d
3
fread c
def
fread d
def
1!:0 <c
+-+------------------+-+---+------+----------+
|ä|2022 3 23 11 46 37|3|rw-|------|-rw-r--r--|
+-+------------------+-+---+------+----------+
1!:0 <d
a.i.>{.{.1!:0 <c
195 164
On Wed, Mar 23, 2022 at 11:22 AM bill lam <[email protected]> wrote:
This is a bug in J. J should follow MacOS file system name normalization
rule. I'll take a look.
On Wed, 23 Mar 2022 at 11:01 AM Eric Iverson <[email protected]>
wrote:
Not sure. And not sure I want to know.
But to continue the example:
fread c NB. fails on macos as the system has the decomposed form as the
name it looks for
fread d
abc
On Tue, Mar 22, 2022 at 9:39 PM Elijah Stone <[email protected]> wrote:
> I wonder what happens when you create two files with distinct names, and
> then unicode changes such that they are the same when
> normalised/casefolded/..
>
> Probably nothing good.
>
> On Tue, 22 Mar 2022, Eric Iverson wrote:
>
> > My favorite unicode story is from macos filenames.
> >
> > They decompose filenames and only track the decomposed form (letter
> > separate from the overstrike).
> >
> > The following accented chars look the same, but have different values.
> >
> > c=: 195 164{a. NB. composed
> > c
> > ä
> > d=: 97 204 136{a. NB. decomposed
> > d
> > ä
> > c-:d
> > 0
> > 'abc'fwrite c
> > fread c NB. fails on macos as the system has the decomposed form as
the
> > name it looks for
> >
> > Torvald has a wonderful rant about this that is a fun read.
> >
> > On Tue, Mar 22, 2022 at 7:02 PM Raul Miller <[email protected]>
> wrote:
> >
> >> I ran into a situation, today (dealing with files), where most of the
> >> files were utf-8 encoded but some represented the latin-1 "code
plane"
> >> with 8 bit characters.
> >>
> >> To cope with this issue, I coded up a mechanism to test whether the
> >> file contained only valid utf-8 sequences, and used {{ ": 10 u: y }}
> >> for the files which failed this test.
> >>
> >> In other words:
> >>
> >> cclass=: (i.9) (48+i.9)} 256#9
> >> cstates=: 0 10#:10* ".;._2{{)n
> >> 0 7.3 2 3 4 5 6 7.3 7.3 7.1 NB. 0: start char
> >> sequence
> >> 0 7.3 2 3 4 5 6 7.3 7.3 7.1 NB. 1: finish char
> >> sequence, start next
> >> 7.3 1 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 2: need one
> >> more character
> >> 7.3 2 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 3: need two
> >> more characters
> >> 7.3 3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 4: need three
> >> more characters
> >> 7.3 4 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 5: need four
> >> more characters
> >> 7.3 5 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 NB. 6: need five
> >> more characters
> >> 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.2 NB. 7: end
> >> }}
> >>
> >> utf8lenb=: <:2#.>1 #each~1+i.8
> >> utf8ok=: {{
> >> try.
> >> (1;cstates;cclass) ;: '.',~'012345678_'{~ utf8lenb I. 3 u: y
> >> 1
> >> catch.
> >> 0
> >> end.
> >> }}
> >>
> >> NB. most content is utf-8 -- assume non-utf-8 sequences are
> ascii+latin-1
> >> latin2utf8=: {{
> >> if.utf8ok y do. y else. ":10 u: y end.
> >> }}
> >>
> >> I don't know if this approach would be useful to anyone else here,
> >> but... just in case...
> >>
> >> FYI,
> >>
> >> --
> >> Raul
> >>
----------------------------------------------------------------------
> >> For information about J forums see
http://www.jsoftware.com/forums.htm
> >>
> > ----------------------------------------------------------------------
> > For information about J forums see
http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm