On Tue, 08 Mar 2005, "J. Landman Gay" <[EMAIL PROTECTED]> wrote:
>> Does anyone know which search algorithm is being used in >> RR? Searching seems sort of slow, leading me to believe that it isn't >> Boyer-Moore-Sunday.
Well, there is searching and there is searching. The built-in "find" command does a reasonable job. It isn't lightening fast, but it is quick enough. I don't know what algorithm it uses. The old docs (before version 2.5) were all housed in stacks and used the native "find" command to search. It was fast.
To implement a more flexible system, in version 2.5 the stack-based documentation was removed to XML files on disk, which are now loaded into a one-card shell. There is no built-in Transcript command to search files on disk, so it is being done by scripts. While Revolution is very quick at reading files from disk, the searching itself has to be done by marching through all the text and parsing the XML, and it is very slow.
I think the team is aware of this.
-- Jacqueline Landman Gay
Searching through XML files can actually be satisfactorily quick; compare some benchmarks below.
On Feb 27 I had announced an update of my "Topsearch" tool on this list (ANN: Update of "Topsearch" for XML files), which at least is an attempt heading in the direction you are describing.
Screenshots can be viewed on my website <http://www.sanke.org/MetaMedia/Screenshots.htm>
The update is already usable, but still needs some fine-tuning before release
The "Dictionary"-, "Faq"-, "Topics"- and "Glossary"-folders can be searched. The results are displayed in the field on the right with the searchstring colored and the XML-file addresses inserted as links. Clicking on such a file link displays the complete article in the left field - again with the searchstring colored. If the article itself contains links these are displayed for further reference.
As you can see from the screenshots, listing the (all the) whole lines containing the searchstring gives you a better idea of the context and provides more information about what could be found in the full article than with just listing the filenames.
The full texts of the XML files are searched. Search times for the 1496 files of the Transcript Dictionary displaying the found lines ( + their addresses and the colored seachstring) and the first complete relevant article vary between 1 and 2 seconds. During the search the progress is indicated by the scrollbar and the accumulated number of found lines and files.
Several modes for searching are available:
- basic search: all words that equal the searchstring or contain the searchstring are found, i.e. searchstring "background" would find all instances of "background" and additionally such as "openbackground" or "backgroundbehavior"
- searchstring + strings to the right: "background" and "backgroundbehavior", but not "openbackground"
- searchstring + strings to the left
- only whole matches for the search: only lines and files containing "background" are found
- searching for phrases like "date and time" etc..
Examples (search times on a WindowsXP computer with 2 GHz) and each time searching the complete Dictionary with 1496 XML files
- basic search for "background" including additional strings on right and left: 288 found lines in 135 found XML files - 1.7 seconds
- only "background" as whole matches: 46 lines in 24 XML files - 1 second
- "backgroundbehavior": 19 lines in 18 XML files - 1.1 seconds
- "custom properties": 49 lines in 22 XML files - 1.7 seconds.
Regards,
Wilhelm Sanke <www.sanke.org/MetaMedia>
_______________________________________________ use-revolution mailing list [email protected] http://lists.runrev.com/mailman/listinfo/use-revolution
