Re: [PD] search engine with xapian backend

2013-10-01 Thread Hans-Christoph Steiner

Wow, that's very nice!  Well done!

.hc

On Sep 30, 2013, at 3:59 PM, Jonathan Wilkes wrote:

> Here's a quick demo of some nice changes:
> https://puredata.info/Members/jancsika/search-plugin-with-xapian.webm/view
> 
> Sorry about the size of the file-- I can remove some of the old demo builds 
> if it's
> a problem.
> 
> Updates:
> * all metadata fields are searchable using Xapian's field:value syntax.  So 
> author:puckette and
> even outlet_0:pointer can be used by themselves or with free text to refine a 
> search
> * Want to return all patches that contain an instance of sigmund~?  Search 
> for object:sigmund~.
> Works on exact text without stemming-- e.g., object:clip~ will give different 
> results than object:clip
> * hand-crafted some descriptive text for all pdf manuals in pd svn.  Includes 
> gem manual and others.
> * formatted escaped commas correctly
> * added a firefox-style find menu bound to 
> * reduced index-build time and database size (both cut roughly in half)
> * simplified doc search to exclude duplicates (for example, from having extra 
> and extra/Gem in
> the path)
> * prettified the "info" icon
> * use html s for description in search results
> * parse Gem docs for description and keywords
> * allow to cancel index building
> * use libdir libname/object prefix only for libdir results
> * put name of libdir in description of all readmes and license.txt files
> * reorganized and simplified the homepage topics
> * reorganized code and removed some global variable (still ugly, but not
> as ugly as it used to be)
> * saved document data to the database as FUDI messages. (Easy to parse
> if someone wants to make a [docsearch] object...)
> 
> Next I'm going to work on integrating it into Pd-l2ork, and maybe
> break out the combobox into toggle buttons.
> 
> Of course if anyone is an information retrieval specialist feel free
> to make suggestions.  I'm using a bunch of old docs that aren't
> updated with description info, which is why so many of them have
> the ugly description note.  Most of the new docs have pd meta info.
> Also, I'm mixing some Pd vanilla and l2ork paths which is why some
> docs show up twice.  (You can see the full path in the status bar at
> the bottom.)
> 
> Best,
> Jonathan
> 
> ___
> Pd-list@iem.at mailing list
> UNSUBSCRIBE and account-management -> 
> http://lists.puredata.info/listinfo/pd-list


___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


[PD] search engine with xapian backend

2013-09-30 Thread Jonathan Wilkes
Here's a quick demo of some nice changes:
https://puredata.info/Members/jancsika/search-plugin-with-xapian.webm/view

Sorry about the size of the file-- I can remove some of the old demo builds if 
it's
a problem.

Updates:
* all metadata fields are searchable using Xapian's field:value syntax.  So 
author:puckette and
even outlet_0:pointer can be used by themselves or with free text to refine a 
search
* Want to return all patches that contain an instance of sigmund~?  Search for 
object:sigmund~.
Works on exact text without stemming-- e.g., object:clip~ will give different 
results than object:clip
* hand-crafted some descriptive text for all pdf manuals in pd svn.  Includes 
gem manual and others.
* formatted escaped commas correctly
* added a firefox-style find menu bound to 
* reduced index-build time and database size (both cut roughly in half)
* simplified doc search to exclude duplicates (for example, from having extra 
and extra/Gem in
the path)
* prettified the "info" icon
* use html s for description in search results
* parse Gem docs for description and keywords
* allow to cancel index building
* use libdir libname/object prefix only for libdir results
* put name of libdir in description of all readmes and license.txt files
* reorganized and simplified the homepage topics
* reorganized code and removed some global variable (still ugly, but not
as ugly as it used to be)
* saved document data to the database as FUDI messages. (Easy to parse
if someone wants to make a [docsearch] object...)

Next I'm going to work on integrating it into Pd-l2ork, and maybe
break out the combobox into toggle buttons.

Of course if anyone is an information retrieval specialist feel free
to make suggestions.  I'm using a bunch of old docs that aren't
updated with description info, which is why so many of them have
the ugly description note.  Most of the new docs have pd meta info.
Also, I'm mixing some Pd vanilla and l2ork paths which is why some
docs show up twice.  (You can see the full path in the status bar at
the bottom.)

Best,
Jonathan

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list