[EMAIL PROTECTED] wrote:

Thank you to Andre, Jonathan, Alex, and Jim!!!

Your great suggestions and word lists have provided me the necessary ingredients to achieve my goal. I will probably post further questions for optimizing the speed of word comparisons. If you have ideas or a script that works well, please post it if you don't mind. I appreciate your help!


Roger,
this was a very useful trigger for me. I've had a half-completed project to write some word-game programs that got left behind a couple of months ago.


One of the things I needed then was a spellchecker; I looked briefly at the Mozilla spellchecker (but didn't like their dictionary - seemed to have a lot of junk in it for my purposes). That was when I found the OpenOffice dictionaries, and looked at them enough to figure it would take some work (and in particular, expanding their downloadable dictionary to a simple word list would take a c compiler - which set of my allergy to using C :-)

I looked at it again, and decided I could dirty my hands for 5 minutes, downloaded the MySpell package, compiled the unmunch program to convert from dict+affix to simple word list.

Given that word list (162K words, 1.75Mbytes), I tried the simple brute force method, namely

  put the millisecs into tStart
  put tWords into field "inField"
  put 0 into t
  repeat for each word w in tWords
    add 1 to t
    replace "." with empty in w  -- probably more of these should be done
    replace "," with empty in w
    replace "!" with empty in w
    if w is not among the words of gWords then
      set the textstyle of word t of field "inField" to "bold"
    end if
  end repeat


This took on average 8 millisecs per word in tWords. Perfectly adequate for small input "documents".

Then I tried a slightly more complex way:
setup

  put url ("file:" & tFile) into gWords
  repeat for each word w in gWords
    put 1 into  gArray[w]
  end repeat

and then

  put tWords into field "inField"
  put 0 into t
  repeat for each word w in tWords
    add 1 to t
    replace "." with empty in w
    replace "," with empty in w
    replace "!" with empty in w
    if gArray[w] <> 1 then
      set the textstyle of word t of field "inField" to "bold"
    end if
  end repeat

This took 2 millisecs for 50 words, so would be reasonable for even large-ish documents.


I tried to put this sample stack onto RevOnline - but I'm having some problem connecting to the server, so you can find it instead at
www.tweedly.net/RunRev/SpellCheck.rev
www.tweedly.net/RunRev/allwords.dic
(remember the dic is 1.75M - don't download it unless you really want it !)


-- Alex.


-- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.300 / Virus Database: 265.7.6 - Release Date: 27/01/2005

_______________________________________________
use-revolution mailing list
[email protected]
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to