>  text of Finnegan's Wake.

The actual name of the book by James Joyce is "Finnegans Wake" and one
could argue that it is not written in English  ;)

On Wed, Apr 4, 2018 at 12:30 PM, Raul Miller <rauldmil...@gmail.com> wrote:

> There are some unmentioned issues that may trip you up eventually with
> this approach, for example, if you try to apply these routines to the
> text of Finnegan's Wake.
>
> To hint at those issues, here's an approach that takes you directly to
> the final result:
>
>    ex1=: <'This is Skip''s test. Testing one, two, three. Count 3, 2, 1.'
>
> DELIM=:'.?!'
> toss=:a.#~1-(a.e.DELIM,":i.10)+.(tolower~:toupper) a.
> separateclean=:3 :0
>   a:-.~(e.&DELIM <@deb;._2 tolower) '.',~(;y) -. toss
> )
>
>    separateclean ex1
> ┌──────────────────┬─────────────────────┬───────────┐
> │this is skips test│testing one two three│count 3 2 1│
> └──────────────────┴─────────────────────┴───────────┘
>
>
> And here's a longer approach which takes you there in two steps where
> the result of the first step will be the same length as the result of
> the second step:
>
> separatedirty=:3 :0
>   (;:'.')-.~(e.&DELIM <@deb;.2 ]) '.',~;y
> )
> clean=: tolower@-.&(toss,DELIM) L:0
>
>    separatedirty ex1
> ┌────────────────────┬────────────────────────┬──────────────┐
> │This is Skip's test.│Testing one, two, three.│Count 3, 2, 1.│
> └────────────────────┴────────────────────────┴──────────────┘
>    clean separatedirty ex1
> ┌──────────────────┬─────────────────────┬───────────┐
> │this is skips test│testing one two three│count 3 2 1│
> └──────────────────┴─────────────────────┴───────────┘
>
>
> But with ill conditioned text (Finnegan's Wake being an example of
> that), I expect cases where separateclean gives a different result
> from clean@separatedirty
>
> But that's what makes text fun...
>
> --
> Raul
>
>
> On Wed, Apr 4, 2018 at 12:02 PM, Skip Cave <s...@caveconsulting.com>
> wrote:
> > I have the following boxed data:
> >
> > ex1=. <'This is Skip''s test. Testing one, two, three. Count 3, 2, 1.'
> >
> >
> > ex1
> >
> > ┌────────────────────────────────────────────────────────────┐
> >
> > │This is Skip's test. Testing one, two, three. Count 3, 2, 1.│
> >
> > └────────────────────────────────────────────────────────────┘
> >
> > I want to build a verb that will separate this boxed text data into
> > sentences.
> >
> >
> > ex2=. (<'This is Skip''s test.'),(<'Testing one, two, three.'),(<'Count
> 3,
> > 2, 1.')
> >
> > ex2
> >
> > ┌────────────────────┬────────────────────────┬──────────────┐
> >
> > │This is Skip's test.│Testing one, two, three.│Count 3, 2, 1.│
> >
> > └────────────────────┴────────────────────────┴──────────────┘
> >
> > I also want to get rid of all punctuation and caps:
> >
> > ex3=. (<'this is skips test'),(<'testing one two three'),(<'count 3 2 1')
> >
> > ex3
> >
> > ┌──────────────────┬─────────────────────┬───────────┐
> >
> > │this is skips test│testing one two three│count 3 2 1│
> >
> > └──────────────────┴─────────────────────┴───────────┘
> >
> > What is a reasonable J verb to do this separation and cleanup?
> >
> > Skip
> >
> > Cave Consulting LLC
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to