On 01/24/2013 04:59 PM, Jonathan Wilkes wrote: > I've looked a bit at the Xapian API. Here's my preliminary route to changing > the search plugin to use Xapian. > > ***Build the Index*** > * Read the file for each doc. > regsub out all "#X foo number number" stuff since it won't help the search > * Optional: prefix all object names with a XAPIAN prefix so that the user can > search for instances of objects if > they want. Additionally include the object names unprefixed so they count > toward a score when the user isn't > searching just for objects
This sounds quite interesting, how do you mean searching for instances of objects? > * Prefix all the pd META stuff so that users can search by category, author, > etc., and also include it unprefixed > so that again it counts toward a general score when not searching for a > particular field > * Include the following as the document data: base directory, filename, pd > META KEY/values pairs. I include the > pd META stuff in the doc data since we want to display some of it (keywords, > maybe other stuff in the future) in > the search results. > > Then it's trivial to check for database existence, and only build it if it's > not there. (Maybe just have the last link > on the homepage be "Rebuild Index".) Sounds all good. > Now we have an index so > > *** Search *** > Search. Depending on speed, I might just keep it the way it is, showing ALL > results instead of the Google way of 10 per page or whatever. > > *** Search by Category *** > This will be nicer than it is currently-- instead of cryptic regexp text > showing up in the search bar, it will just be > the prefixed keyword, like "Kbandlimited" or "Ksignal". That's easy enough > to grasp that I don't think we'll need > some special syntax for category searches-- newbies can just depend on the > home page links. Plus, if they want to search for several categories at once > they can quickly figure out it's just a matter of prefixing a > "K" in front of the category and are way less likely to generate a tcl error > as they would be screwing around inside a regexp. (I could even make a > mousebinding, like <ctrl-click> will add a category to the search bar without > triggering a search, so they can use that to gang several together.) what about "category:bandlimited" The K seems arbitrary and hard to remember. > Also, if I understand the tclxapian interface correctly, I can just hand off > a tcl string to Xapian so the search-plugin can get out of tcl > "quoting-hell". (Thus, much less chance of generating errors because of > malformed > lists.) That sounds very nice too. Sounds to me like this would be a large improvement. Once you start committing some code, I'll try to find the time to add xapian to the Mac and Windows builds so people can start using/testing early. .hc _______________________________________________ [email protected] mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
