If you search for the string big.txt on the page
http://norvig.com/spell-correct.html the second instance of that
string is a link to http://norvig.com/big.txt and that is what I used.

Thanks,

-- 
Raul


On Mon, Nov 23, 2015 at 12:21 PM, Dan Bron <[email protected]> wrote:
> Oh, interesting; I’m going to have to study this in more detail.  Thank you.
>
> What did you use for your corpus? (big.txt)
>
> -Dan
>
>
>> On Nov 20, 2015, at 2:55 PM, Raul Miller <[email protected]> wrote:
>>
>> If I have read his implementation properly, it works something like this:
>>
>> require'regex'
>> RX_OPTIONS_UTF8=: 0
>>
>> alphabet=: (#~ ] ~: toupper) a.
>>
>> countbigwords=:3 :0
>>  NB. handle persistent data explicitly
>>  big=. tolower fread '~user/temp/big.txt'
>>  they=. ;:(' ' (I.-.big e. alphabet)} big)
>>  words=: ~.they
>>  count=: (#/.~ they),0
>>  i.0 0
>> )
>>
>> alt=:4 :0
>>  c=. x{y
>>  (alphabet-.c) x}each<y
>> )
>> dubin=:4 :0
>>  (x{.y)&,each ,&(x}.y)each alphabet
>> )
>>
>> edits=:3 :0
>>  del=. 1 <\. y
>>  trn=. ((<-1 2)&C.each }.<\y),each 2}.(<\.y),a:
>>  rpl=. ;alt&y each i.#y
>>  ins=. ~.;dubin&y each i.1+#y
>>  del,trn,rpl,ins
>> )
>>
>> best=:3 :0
>>  n=. words i. y
>>  y{~(i. >./)n{count
>> )
>>
>> correct=:3 :0
>>  w=. <y
>>  if. w e. words do. w return. end.
>>  e=. edits y
>>  if. 1 e. e e. words do. best e return. end.
>>  e2=. ;edits each e
>>  if. 1 e. e2 e. words do. best e2 return. end.
>>  w
>> )
>>
>> countbigwords''
>>
>> Seems plausible enough on a few simple tests.
>>
>> Example use:
>>
>>   correct 'thatl'
>> +----+
>> |that|
>> +----+
>>
>> Thanks,
>>
>> --
>> Raul
>>
>> On Thu, Nov 19, 2015 at 12:06 PM, Dan Bron <[email protected]> wrote:
>>> Peter Norvig has a blog entry on how to write a fairly effective spelling 
>>> corrector (75-90%) in very little code, using some Bayesian analysis:
>>>
>>>     http://norvig.com/spell-correct.html 
>>> <http://norvig.com/spell-correct.html>
>>>
>>> A worthwhile read.
>>>
>>> I’m using this program as an exercise in learning Perl6 (which, believe it 
>>> or not, now has an official release date). I wonder though, how would it 
>>> look in J?
>>>
>>> -Dan
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to