Re: [Jprogramming] Simple and effective spelling corrector

Raul Miller Mon, 23 Nov 2015 10:04:38 -0800

Note that that's really just a draft.

For example, I think you should be able to get a slight speedup from
countbigwords by replacing:


  they=. ;:(' ' (I.-.big e. alphabet)} big)

with

  they=. ;:(' ' (big I.@:e. a.-.alphabet)} big)

This would take advantage of the special code for (I.@:e.) and also
moves the negation to a result the size of (a.) from a result the size
of (big).

Thanks,

-- 
Raul


On Mon, Nov 23, 2015 at 12:48 PM, Dan Bron <[email protected]> wrote:
> Thanks, I had missed that.  Appreciate the pointer.
>
> -Dan
>
>
>> On Nov 23, 2015, at 12:45 PM, Raul Miller <[email protected]> wrote:
>>
>> If you search for the string big.txt on the page
>> http://norvig.com/spell-correct.html the second instance of that
>> string is a link to http://norvig.com/big.txt and that is what I used.
>>
>> Thanks,
>>
>> --
>> Raul
>>
>>
>> On Mon, Nov 23, 2015 at 12:21 PM, Dan Bron <[email protected]> wrote:
>>> Oh, interesting; I’m going to have to study this in more detail.  Thank you.
>>>
>>> What did you use for your corpus? (big.txt)
>>>
>>> -Dan
>>>
>>>
>>>> On Nov 20, 2015, at 2:55 PM, Raul Miller <[email protected]> wrote:
>>>>
>>>> If I have read his implementation properly, it works something like this:
>>>>
>>>> require'regex'
>>>> RX_OPTIONS_UTF8=: 0
>>>>
>>>> alphabet=: (#~ ] ~: toupper) a.
>>>>
>>>> countbigwords=:3 :0
>>>> NB. handle persistent data explicitly
>>>> big=. tolower fread '~user/temp/big.txt'
>>>> they=. ;:(' ' (I.-.big e. alphabet)} big)
>>>> words=: ~.they
>>>> count=: (#/.~ they),0
>>>> i.0 0
>>>> )
>>>>
>>>> alt=:4 :0
>>>> c=. x{y
>>>> (alphabet-.c) x}each<y
>>>> )
>>>> dubin=:4 :0
>>>> (x{.y)&,each ,&(x}.y)each alphabet
>>>> )
>>>>
>>>> edits=:3 :0
>>>> del=. 1 <\. y
>>>> trn=. ((<-1 2)&C.each }.<\y),each 2}.(<\.y),a:
>>>> rpl=. ;alt&y each i.#y
>>>> ins=. ~.;dubin&y each i.1+#y
>>>> del,trn,rpl,ins
>>>> )
>>>>
>>>> best=:3 :0
>>>> n=. words i. y
>>>> y{~(i. >./)n{count
>>>> )
>>>>
>>>> correct=:3 :0
>>>> w=. <y
>>>> if. w e. words do. w return. end.
>>>> e=. edits y
>>>> if. 1 e. e e. words do. best e return. end.
>>>> e2=. ;edits each e
>>>> if. 1 e. e2 e. words do. best e2 return. end.
>>>> w
>>>> )
>>>>
>>>> countbigwords''
>>>>
>>>> Seems plausible enough on a few simple tests.
>>>>
>>>> Example use:
>>>>
>>>>  correct 'thatl'
>>>> +----+
>>>> |that|
>>>> +----+
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> Raul
>>>>
>>>> On Thu, Nov 19, 2015 at 12:06 PM, Dan Bron <[email protected]> wrote:
>>>>> Peter Norvig has a blog entry on how to write a fairly effective spelling 
>>>>> corrector (75-90%) in very little code, using some Bayesian analysis:
>>>>>
>>>>>    http://norvig.com/spell-correct.html 
>>>>> <http://norvig.com/spell-correct.html>
>>>>>
>>>>> A worthwhile read.
>>>>>
>>>>> I’m using this program as an exercise in learning Perl6 (which, believe 
>>>>> it or not, now has an official release date). I wonder though, how would 
>>>>> it look in J?
>>>>>
>>>>> -Dan
>>>>> ----------------------------------------------------------------------
>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Simple and effective spelling corrector

Reply via email to