When all you have is a hammer ....

Take a look at swish-e (www.swish-e.org).  It is a superb open source 
indexer that has tremendous capabilities.  It can index pdf in addition 
html files (others too), and is ideally suited to your application.

You run Swish-e in spider mode to build an index of your help directory.  
Then you run it in search mode to return relvant pages.  There are a lot 
of cgi scripts available to help you do this and to format your results 
page, including highlighting search terms in the results!

A config file lets you exclude certain words, characters, etc. You can 
also define custom meta tags to build into the index.  In your case, the 
category/sub-category might be defined in meta tags.  Or have them in 
<H1> and <H2> tags, and have swish-e include <hn> tags in the index.  It 
will actually index all the text, but your query might specify to look in 
particular meta or HTML tags so as to increase the relevance of the 
returned results.

Anyway, they've already done the work for you.

>This is not urgent, and the taf does work, but I am 
>wondering if there is a more elegant way to accomplish 
>the task.
>
>We are setting up a web folder with help files.  What 
>we'd like to do is drop a new help file into the folder, 
>and then have it indexed along with the existing files. 
>
>Here is the sequence of steps.
>1. Each help file has a title, category and summary that 
>are preceeded by a special character (I use the ^) and a 
>double ^ to end the summary.  These are standard html 
>files, with explanations that follow the summary.
>2. I have a taf that reads the directory and returns all 
>the files ending in .htm
>3. a for loop that operates on each, placing the file in 
>a variable.
>4. I use <@locate> to find the positions of the ^ 
>character in the string, so I can extract just the 
>string that includes the title, category and summary, 
>and then <@calc to calculate the length.
>5.  I use atomize to turn the returned string into a 3 
>element array.
>6.  I use <@addarray> to populate the table of all the 
>help files, showing the viewer the title (as a hyperlink 
>to the actual file), category and summary of each of the 
>help files in the directory.
>
>As I said, the taf works, but when the viewer sees the 
>actual help file, there are these unsightly ^ characters.
>
>I tried using comment tags to hide them but there was no 
>way to easily get rid of the comment tag characters when 
>I built the array.  I couldn't seem to find a way to use 
>atomize with a word instead of a character.
>
>I hope this was clear enough.  I learned alot about 
>arrays and string manipulation in the process.  THe main 
>point of this app is to have people who write the help 
>files just drop them in the directory, without needing 
>to update a database.
>
>John Newsom
>
>________________________________________________________________________
>TO UNSUBSCRIBE: send a plain text/US ASCII email to [EMAIL PROTECTED]
>                with unsubscribe witango-talk in the message body
>


Bill Conlon

To the Point
345 California Avenue Suite 2
Palo Alto, CA 94306

office: 650.327.2175
fax:    650.329.8335
mobile: 650.906.9929
e-mail: mailto:bill@;tothept.com
web:    http://www.tothept.com


________________________________________________________________________
TO UNSUBSCRIBE: send a plain text/US ASCII email to [EMAIL PROTECTED]
                with unsubscribe witango-talk in the message body

Reply via email to