SOLAR BONES by Mike McCormack has no periods "."

-----Original Message-----
From: Programming <[email protected]> On Behalf Of Jose 
Mario Quintana
Sent: Wednesday, April 4, 2018 12:09
To: Programming forum <[email protected]>
Subject: Re: [Jprogramming] Separating sentences

>  text of Finnegan's Wake.

The actual name of the book by James Joyce is "Finnegans Wake" and one could 
argue that it is not written in English  ;)

On Wed, Apr 4, 2018 at 12:30 PM, Raul Miller <[email protected]> wrote:

> There are some unmentioned issues that may trip you up eventually with 
> this approach, for example, if you try to apply these routines to the 
> text of Finnegan's Wake.
>
> To hint at those issues, here's an approach that takes you directly to 
> the final result:
>
>    ex1=: <'This is Skip''s test. Testing one, two, three. Count 3, 2, 1.'
>
> DELIM=:'.?!'
> toss=:a.#~1-(a.e.DELIM,":i.10)+.(tolower~:toupper) a.
> separateclean=:3 :0
>   a:-.~(e.&DELIM <@deb;._2 tolower) '.',~(;y) -. toss
> )
>
>    separateclean ex1
> ┌──────────────────┬─────────────────────┬───────────┐
> │this is skips test│testing one two three│count 3 2 1│ 
> └──────────────────┴─────────────────────┴───────────┘
>
>
> And here's a longer approach which takes you there in two steps where 
> the result of the first step will be the same length as the result of 
> the second step:
>
> separatedirty=:3 :0
>   (;:'.')-.~(e.&DELIM <@deb;.2 ]) '.',~;y
> )
> clean=: tolower@-.&(toss,DELIM) L:0
>
>    separatedirty ex1
> ┌────────────────────┬────────────────────────┬──────────────┐
> │This is Skip's test.│Testing one, two, three.│Count 3, 2, 1.│ 
> └────────────────────┴────────────────────────┴──────────────┘
>    clean separatedirty ex1
> ┌──────────────────┬─────────────────────┬───────────┐
> │this is skips test│testing one two three│count 3 2 1│ 
> └──────────────────┴─────────────────────┴───────────┘
>
>
> But with ill conditioned text (Finnegan's Wake being an example of 
> that), I expect cases where separateclean gives a different result 
> from clean@separatedirty
>
> But that's what makes text fun...
>
> --
> Raul
>
>
> On Wed, Apr 4, 2018 at 12:02 PM, Skip Cave <[email protected]>
> wrote:
> > I have the following boxed data:
> >
> > ex1=. <'This is Skip''s test. Testing one, two, three. Count 3, 2, 1.'
> >
> >
> > ex1
> >
> > ┌────────────────────────────────────────────────────────────┐
> >
> > │This is Skip's test. Testing one, two, three. Count 3, 2, 1.│
> >
> > └────────────────────────────────────────────────────────────┘
> >
> > I want to build a verb that will separate this boxed text data into 
> > sentences.
> >
> >
> > ex2=. (<'This is Skip''s test.'),(<'Testing one, two, 
> > three.'),(<'Count
> 3,
> > 2, 1.')
> >
> > ex2
> >
> > ┌────────────────────┬────────────────────────┬──────────────┐
> >
> > │This is Skip's test.│Testing one, two, three.│Count 3, 2, 1.│
> >
> > └────────────────────┴────────────────────────┴──────────────┘
> >
> > I also want to get rid of all punctuation and caps:
> >
> > ex3=. (<'this is skips test'),(<'testing one two three'),(<'count 3 
> > 2 1')
> >
> > ex3
> >
> > ┌──────────────────┬─────────────────────┬───────────┐
> >
> > │this is skips test│testing one two three│count 3 2 1│
> >
> > └──────────────────┴─────────────────────┴───────────┘
> >
> > What is a reasonable J verb to do this separation and cleanup?
> >
> > Skip
> >
> > Cave Consulting LLC
> > --------------------------------------------------------------------
> > -- For information about J forums see 
> > http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to