Nice!

I wish I had thought of that.

Thanks,

-- 
Raul

On Tue, Mar 22, 2022 at 7:10 PM Elijah Stone <[email protected]> wrote:
>
> FWIW here is a one-liner hack which accomplishes the same thing:
>
> latin2utf8=: (9&u: ] ]) :: (8 u: 10 u: ])
>
>   -E
>
> On Tue, 22 Mar 2022, Raul Miller wrote:
>
> > I ran into a situation, today (dealing with files), where most of the
> > files were utf-8 encoded but some represented the latin-1 "code plane"
> > with 8 bit characters.
> >
> > To cope with this issue, I coded up a mechanism to test whether the
> > file contained only valid utf-8 sequences, and used {{ ": 10 u: y }}
> > for the files which failed this test.
> >
> > In other words:
> >
> > cclass=: (i.9) (48+i.9)} 256#9
> > cstates=: 0 10#:10* ".;._2{{)n
> >  0    7.3  2    3    4    5    6    7.3  7.3  7.1 NB. 0: start char sequence
> >  0    7.3  2    3    4    5    6    7.3  7.3  7.1 NB. 1: finish char
> > sequence, start next
> >  7.3  1    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 2: need one
> > more character
> >  7.3  2    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 3: need two
> > more characters
> >  7.3  3    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 4: need three
> > more characters
> >  7.3  4    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 5: need four
> > more characters
> >  7.3  5    7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3 NB. 6: need five
> > more characters
> >  7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.3  7.2 NB. 7: end
> > }}
> >
> > utf8lenb=: <:2#.>1 #each~1+i.8
> > utf8ok=: {{
> >  try.
> >    (1;cstates;cclass) ;: '.',~'012345678_'{~ utf8lenb I. 3 u: y
> >    1
> >  catch.
> >    0
> >  end.
> > }}
> >
> > NB. most content is utf-8 -- assume non-utf-8 sequences are ascii+latin-1
> > latin2utf8=: {{
> >  if.utf8ok y do. y else. ":10 u: y end.
> > }}
> >
> > I don't know if this approach would be useful to anyone else here,
> > but... just in case...
> >
> > FYI,
> >
> > --
> > Raul
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to