If you search for the string big.txt on the page http://norvig.com/spell-correct.html the second instance of that string is a link to http://norvig.com/big.txt and that is what I used.
Thanks, -- Raul On Mon, Nov 23, 2015 at 12:21 PM, Dan Bron <[email protected]> wrote: > Oh, interesting; I’m going to have to study this in more detail. Thank you. > > What did you use for your corpus? (big.txt) > > -Dan > > >> On Nov 20, 2015, at 2:55 PM, Raul Miller <[email protected]> wrote: >> >> If I have read his implementation properly, it works something like this: >> >> require'regex' >> RX_OPTIONS_UTF8=: 0 >> >> alphabet=: (#~ ] ~: toupper) a. >> >> countbigwords=:3 :0 >> NB. handle persistent data explicitly >> big=. tolower fread '~user/temp/big.txt' >> they=. ;:(' ' (I.-.big e. alphabet)} big) >> words=: ~.they >> count=: (#/.~ they),0 >> i.0 0 >> ) >> >> alt=:4 :0 >> c=. x{y >> (alphabet-.c) x}each<y >> ) >> dubin=:4 :0 >> (x{.y)&,each ,&(x}.y)each alphabet >> ) >> >> edits=:3 :0 >> del=. 1 <\. y >> trn=. ((<-1 2)&C.each }.<\y),each 2}.(<\.y),a: >> rpl=. ;alt&y each i.#y >> ins=. ~.;dubin&y each i.1+#y >> del,trn,rpl,ins >> ) >> >> best=:3 :0 >> n=. words i. y >> y{~(i. >./)n{count >> ) >> >> correct=:3 :0 >> w=. <y >> if. w e. words do. w return. end. >> e=. edits y >> if. 1 e. e e. words do. best e return. end. >> e2=. ;edits each e >> if. 1 e. e2 e. words do. best e2 return. end. >> w >> ) >> >> countbigwords'' >> >> Seems plausible enough on a few simple tests. >> >> Example use: >> >> correct 'thatl' >> +----+ >> |that| >> +----+ >> >> Thanks, >> >> -- >> Raul >> >> On Thu, Nov 19, 2015 at 12:06 PM, Dan Bron <[email protected]> wrote: >>> Peter Norvig has a blog entry on how to write a fairly effective spelling >>> corrector (75-90%) in very little code, using some Bayesian analysis: >>> >>> http://norvig.com/spell-correct.html >>> <http://norvig.com/spell-correct.html> >>> >>> A worthwhile read. >>> >>> I’m using this program as an exercise in learning Perl6 (which, believe it >>> or not, now has an official release date). I wonder though, how would it >>> look in J? >>> >>> -Dan >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
