> text of Finnegan's Wake. The actual name of the book by James Joyce is "Finnegans Wake" and one could argue that it is not written in English ;)
On Wed, Apr 4, 2018 at 12:30 PM, Raul Miller <[email protected]> wrote: > There are some unmentioned issues that may trip you up eventually with > this approach, for example, if you try to apply these routines to the > text of Finnegan's Wake. > > To hint at those issues, here's an approach that takes you directly to > the final result: > > ex1=: <'This is Skip''s test. Testing one, two, three. Count 3, 2, 1.' > > DELIM=:'.?!' > toss=:a.#~1-(a.e.DELIM,":i.10)+.(tolower~:toupper) a. > separateclean=:3 :0 > a:-.~(e.&DELIM <@deb;._2 tolower) '.',~(;y) -. toss > ) > > separateclean ex1 > ┌──────────────────┬─────────────────────┬───────────┐ > │this is skips test│testing one two three│count 3 2 1│ > └──────────────────┴─────────────────────┴───────────┘ > > > And here's a longer approach which takes you there in two steps where > the result of the first step will be the same length as the result of > the second step: > > separatedirty=:3 :0 > (;:'.')-.~(e.&DELIM <@deb;.2 ]) '.',~;y > ) > clean=: tolower@-.&(toss,DELIM) L:0 > > separatedirty ex1 > ┌────────────────────┬────────────────────────┬──────────────┐ > │This is Skip's test.│Testing one, two, three.│Count 3, 2, 1.│ > └────────────────────┴────────────────────────┴──────────────┘ > clean separatedirty ex1 > ┌──────────────────┬─────────────────────┬───────────┐ > │this is skips test│testing one two three│count 3 2 1│ > └──────────────────┴─────────────────────┴───────────┘ > > > But with ill conditioned text (Finnegan's Wake being an example of > that), I expect cases where separateclean gives a different result > from clean@separatedirty > > But that's what makes text fun... > > -- > Raul > > > On Wed, Apr 4, 2018 at 12:02 PM, Skip Cave <[email protected]> > wrote: > > I have the following boxed data: > > > > ex1=. <'This is Skip''s test. Testing one, two, three. Count 3, 2, 1.' > > > > > > ex1 > > > > ┌────────────────────────────────────────────────────────────┐ > > > > │This is Skip's test. Testing one, two, three. Count 3, 2, 1.│ > > > > └────────────────────────────────────────────────────────────┘ > > > > I want to build a verb that will separate this boxed text data into > > sentences. > > > > > > ex2=. (<'This is Skip''s test.'),(<'Testing one, two, three.'),(<'Count > 3, > > 2, 1.') > > > > ex2 > > > > ┌────────────────────┬────────────────────────┬──────────────┐ > > > > │This is Skip's test.│Testing one, two, three.│Count 3, 2, 1.│ > > > > └────────────────────┴────────────────────────┴──────────────┘ > > > > I also want to get rid of all punctuation and caps: > > > > ex3=. (<'this is skips test'),(<'testing one two three'),(<'count 3 2 1') > > > > ex3 > > > > ┌──────────────────┬─────────────────────┬───────────┐ > > > > │this is skips test│testing one two three│count 3 2 1│ > > > > └──────────────────┴─────────────────────┴───────────┘ > > > > What is a reasonable J verb to do this separation and cleanup? > > > > Skip > > > > Cave Consulting LLC > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
