I've looked a bit at the Xapian API.  Here's my preliminary route to changing
the search plugin to use Xapian.

***Build the Index***
* Read the file for each doc.
regsub out all "#X foo number number" stuff since it won't help the search
* Optional: prefix all object names with a XAPIAN prefix so that the user can 
search for instances of objects if 
they want.  Additionally include the object names unprefixed so they count 
toward a score when the user isn't
searching just for objects
* Prefix all the pd META stuff so that users can search by category, author, 
etc., and also include it unprefixed
so that again it counts toward a general score when not searching for a 
particular field
* Include the following as the document data: base directory, filename, pd META 
KEY/values pairs.  I include the
pd META stuff in the doc data since we want to display some of it (keywords, 
maybe other stuff in the future) in
the search results.

Then it's trivial to check for database existence, and only build it if it's 
not there.  (Maybe just have the last link
on the homepage be "Rebuild Index".)

Now we have an index so

*** Search ***
Search.  Depending on speed, I might just keep it the way it is, showing ALL 
results instead of the Google way of 10 per page or whatever.

*** Search by Category ***
This will be nicer than it is currently-- instead of cryptic regexp text 
showing up in the search bar, it will just be
the prefixed keyword, like "Kbandlimited" or "Ksignal".  That's easy enough to 
grasp that I don't think we'll need
some special syntax for category searches-- newbies can just depend on the home 
page links.  Plus, if they want to search for several categories at once they 
can quickly figure out it's just a matter of prefixing a 
"K" in front of the category and are way less likely to generate a tcl error as 
they would be screwing around inside a regexp.  (I could even make a 
mousebinding, like <ctrl-click> will add a category to the search bar without 
triggering a search, so they can use that to gang several together.)

Also, if I understand the tclxapian interface correctly, I can just hand off a 
tcl string to Xapian so the search-plugin can get out of tcl "quoting-hell".  
(Thus, much less chance of generating errors because of malformed
lists.)

Any commments, suggestions?

-Jonathan


_______________________________________________
[email protected] mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list

Reply via email to