Re: Searching for a word when it's more than one word

Richmond Mathewson via use-livecode Sat, 01 Sep 2018 03:37:44 -0700

That's because you lot tend to use a silver teaspoon while I tend to usea great big shovel:


https://www.dropbox.com/s/00t8oftb1ydm8ni/Text%20analyzer%20X.livecode.zip?dl=0


Richmond.

On 1/9/2018 1:29 pm, Mark Waddingham via use-livecode wrote:

On 2018-09-01 12:05, Richmond Mathewson via use-livecode wrote:
Obviously, when considering names of places such as Colchester,
Rochester and Chester one has
to search for the longer names first and exclude them from latersearches.
The 'substring' problem (i.e. Chester being 'in' Rochester) isn'trelevant in the above algorithm because we are 'tokenising' input andphrases - essentially changing the alphabet.
i.e. "Rochester Chester Colchester" is turned into ABC, and we matchA, B or C as atomic units.
I should perhaps point out that the 'processText' operation probablyneeds to be a little better in practice - to at least include a 'stop'token for punctuation. For example:
"The man walked starting from East Hartford, West Hartford could beseen in the distance."
In the case where 'Hartford West' and 'Hartford' are the 'known' towns(and not 'East Hartford') - the proposed tokenization would result in:
The,man,walked,starting,from,East,Hartford,West,Hartford,could,be,seen,in,the,distance
Which means you'd get "Hartford West" and "Hartford" - when you shouldonly get "Hartford" (assuming you care about the linguistic structureof the text, at least).
Indeed, the above actually means in preprocessing the text, you canactually vastly reduce the number of words to search - any sequencesof words which aren't in any pharse (or important punctuation) can bereplaced by "*" say. So the above would become:
  *,East,Hartford,*,West,Hartford,*

The "*" tokens block matching multi-word phrases.

Warmest Regards,

Mark.


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Searching for a word when it's more than one word

Reply via email to