Chris, The Boyer-Moore algorithm is only language specific in that it works better in languages that have larger alphabets, and consequently less chance of a single character matching a searched for character.
It is a general purpose search algorithm. It does not look at the end of each word, that is a misunderstanding of the algorithm. It is not based on word suffixes. Basically, if it is looking for a 10 character long sequence, it checks the 10th character, the 20th character, the 30th character, the 40th character, etc, until it finds a character that is in the 10 character long sequence it is looking for. If it isn't in there, then there was no point checking any of the previous 9 characters. Stephen Denne. -- Datacute - Acute Information Revelation Tools http://www.datacute.co.nz/ > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of Chris Little > Sent: Thursday, March 03, 2005 8:54 AM > To: SWORD Developers' Collaboration Forum > Subject: Re: [sword-devel] Searching and Lucene thoughts > > > No. Standard Sword searches just start at the beginning and search to > the end, byte by byte. > > Just on the basis of the abstract you link to, I don't see how this > would be of any benefit. The Boyer-Moore algorithm is very > language-specific. It benefits from the fact that English is a > predominantly suffixing language, as are most European languages, I > would say. Personally, I have difficulty imagining how this actually > speeds search times, but I assume they've done testing and that their > claims are accurate. > > The standard linear search is the most general purpose search algorithm, > and I think general purpose is what we need to maintain. For people who > want faster searches, there is indexed searching available. > > --Chris > > > Lynn Allan wrote: > > <alert comment="iwnacsmndipootv ... i was not a computer science major > > ... "> > > > > Just curious ... does non-indexed sword-api searching use c.s. > > algorithms like Boyer-Moore searching? > > > http://portal.acm.org/citation.cfm?id=359859&coll=ACM&dl=ACM&CFID= 13545783&CFTOKEN=93236524 > > Something I tried to read once (and it was waaaaaay over my head) > concerned very smart "state machine" searching when there is more than > one word being searched for. Seems like it involved Bell Lab > researchers? From one of the A or W or K dudes? > http://portal.acm.org/citation.cfm?id=360855&coll=ACM&dl=ACM&CFID=13626066&C FTOKEN=93658335 > > Does D. Knuth discuss string matching optimizations? > > Would that be applicable to the sword-api? > > </alert> > > > _______________________________________________ > sword-devel mailing list > sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel _______________________________________________ sword-devel mailing list sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel _______________________________________________ sword-devel mailing list sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel