----- Original Message ----- > From: Hans-Christoph Steiner <[email protected]> > To: [email protected] > Cc: > Sent: Friday, January 25, 2013 9:39 PM > Subject: Re: [PD] Using Xapian for the Pd Search Plugin > > > > On 01/24/2013 04:59 PM, Jonathan Wilkes wrote: >> I've looked a bit at the Xapian API. Here's my preliminary route > to changing >> the search plugin to use Xapian. >> >> ***Build the Index*** >> * Read the file for each doc. >> regsub out all "#X foo number number" stuff since it won't > help the search >> * Optional: prefix all object names with a XAPIAN prefix so that the user > can search for instances of objects if >> they want. Additionally include the object names unprefixed so they count > toward a score when the user isn't >> searching just for objects > > This sounds quite interesting, how do you mean searching for instances of > objects?
Well, we can put "clip~" in the search terms, but we can additionally add it to the db with a prefix (something like XOclip~) when it originated from the document as "#X obj 20 10 clip~". (Basically you normalize all the document search terms to lower case, so then upper case denotes certain fields.) I suppose we could also make use of the numbers in "#X obj 20 10", as term with associated lower number coordinates are closer to the top left corner and are more prominent. > > >> * Prefix all the pd META stuff so that users can search by category, > author, etc., and also include it unprefixed >> so that again it counts toward a general score when not searching for a > particular field >> * Include the following as the document data: base directory, filename, pd > META KEY/values pairs. I include the >> pd META stuff in the doc data since we want to display some of it > (keywords, maybe other stuff in the future) in >> the search results. >> >> Then it's trivial to check for database existence, and only build it if > it's not there. (Maybe just have the last link >> on the homepage be "Rebuild Index".) > > Sounds all good. > > >> Now we have an index so >> >> *** Search *** >> Search. Depending on speed, I might just keep it the way it is, showing > ALL results instead of the Google way of 10 per page or whatever. >> >> *** Search by Category *** >> This will be nicer than it is currently-- instead of cryptic regexp text > showing up in the search bar, it will just be >> the prefixed keyword, like "Kbandlimited" or > "Ksignal". That's easy enough to grasp that I don't think > we'll need >> some special syntax for category searches-- newbies can just depend on the > home page links. Plus, if they want to search for several categories at once > they can quickly figure out it's just a matter of prefixing a >> "K" in front of the category and are way less likely to generate > a tcl error as they would be screwing around inside a regexp. (I could even > make a mousebinding, like <ctrl-click> will add a category to the search > bar without triggering a search, so they can use that to gang several > together.) > > what about "category:bandlimited" The K seems arbitrary and hard to > remember. Yeah, I'm just being lazy because the "K" prefix is how its actually stored in the database, and the main user interface is clicking a link. It'd basically just be a regsub there so not too hard to use your syntax. > > >> Also, if I understand the tclxapian interface correctly, I can just hand > off a tcl string to Xapian so the search-plugin can get out of tcl > "quoting-hell". (Thus, much less chance of generating errors because > of malformed >> lists.) > > That sounds very nice too. Sounds to me like this would be a large > improvement. Once you start committing some code, I'll try to find the time > to add xapian to the Mac and Windows builds so people can start using/testing > early. Well, this is all pre-testing stage. Hopefully there's no weird snags in all this. But the documentation seems pretty straightforward so far. -Jonathan > > .hc > > _______________________________________________ > [email protected] mailing list > UNSUBSCRIBE and account-management -> > http://lists.puredata.info/listinfo/pd-list > _______________________________________________ [email protected] mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
