[PD] search plugin update (was: Re: reverse kickstarter update)

2013-09-15 Thread Jonathan Wilkes

Hi list,
 Attached is a first pass at using the Xapian backend to
search Pure Data docs.

What the revision does:
* simplifies building a search index.  It builds once, on the first
search, and all subsequent searches happen very fast.  Previously
it searched the docs themselves every single time and depended
on the OS caching the data, resulting in sluggish performance
especially on Windows.
* natural language, probabalistic searches.  The search terms
in the index were automatically chosen by the engine with no
customization, and already the results are decent.
* nearly no input errors.  Xapian has its own simple syntax, but
for most cases users can ignore it and type in natural language
searches (like Google).  And the few errors the user
can generate have meaningful feedback to the console.  Also,
since I'm passing the input as a string you don't have to worry
about malformed tcl lists or weird characters that previously caused
error.
* everything, including pd files, pdfs and html, is indexed properly
and so will get included in the results in the proper place.
* gives the ability to add results from a remote database with a
couple lines of code.
* allows the removal of Match all terms and Match whole words
checkbuttons, simplifying the interface.
* performs stemming out of the box-- that is, searching for
edit, the engine will take into account editing, edits, edited,
etc.

Installation for linux (Debian):
1) Make sure you have libxapian and tclxapian packages
installed.  Other distros probably have corresponding packages.
2) put search-plugin.tcl in the /startup directory, or if you're
using Pd vanilla just make sure it's in a directory that's specified
in the Path dialog.
3) Run Pd and click ctrl-h or choose Search from the Help
menu.

Further work that needs to be done:
* need to figure out where to create the database directory on
Linux, OSX, and Windows.  The directory needs to be read/writable.
Is there an easy way to do this?
* need a Cancel button next to the progressbar when indexing,
so the user can cancel a long index.

Further work that could be done:
* add pd meta tag/values to the index terms for each document.
This would make it possible to type keyword:foo or author:bar
to search based solely on that pd meta tag/value.
* add filenames to terms
* add object terms so the user can search pd patches for
a particular object instance, i.e., object:clip
* limit the document data in the database to pd meta tags/values
and other metadata.  Right now I'm storing the _entire_ doc text
in the database which obviously wastes space.
* xapian has all kinds of features, like suggesting related searches,
and realtime results.  The latter could be very handy for autocompletion
in object boxes, for example.
* could use the title of html files as description for better result 
descriptions

* could plug in to puredata.info to search for externals, plugins, etc.

As always, feedback welcome.  And feel free to donate some rice
and beans if you can!
https://jwilkes.nfshost.com/donations.php

Best,
Jonathan
# browse docs or search all the documentation using a regexp
# check the Help menu for the Browser item to use it

# todo: use xapian syntax for meta keywords
#keyword:foo
# todo: when cancelling a db index build, we need to remove
# the database completely
# todo: remove both checkbuttons-- not needed
# todo: do newline regsub and document parsing on indexing
# todo: make libdir listing check for duplicates
# todo: hook into the dialog_bindings
# TODO remove the doc_ prefix on procs where its not needed
# TODO enter and up/down/left/right arrow key bindings for nav

# redesign:
# [  search entry  ] Help
# [search] [filter]
#

package require Tk 8.5
package require pd_bindings
package require pd_menucommands
package require xapian 1.0.0

namespace eval ::dialog_helpbrowser2:: {

variable doctypes *.{pd,pat,mxb,mxt,help,txt,htm,html,pdf}

variable searchfont [list {DejaVu Sans}]
variable searchtext {}
variable search_history {}
variable count {}
# $i controls the build_index recursive loop
variable i
variable filelist {}
variable progress {}
variable navbar {}
variable genres
variable cancelled
variable database {}
}

## help browser and support functions #
proc ::dialog_helpbrowser2::open_helpbrowser {mytoplevel} {
if {[winfo exists $mytoplevel]} {
wm deiconify $mytoplevel
raise $mytoplevel
} else {
create_dialog $mytoplevel
}
}

proc ::dialog_helpbrowser2::create_dialog {mytoplevel} {
variable searchfont
variable selected_file
variable genres [list [_ All documents] \
[_ Object Help Patches] \
[_ All About Pd] \
[_ Tutorials] \
[_ Manual] \
[_ Uncategorized] \
]
variable count
foreach genre $genres {
	lappend 

Re: [PD] search plugin update (was: Re: reverse kickstarter update)

2013-09-15 Thread Dan Wilcox
On Sep 15, 2013, at 3:23 PM, pd-list-requ...@iem.at wrote:

 * need to figure out where to create the database directory on
 Linux, OSX, and Windows.  The directory needs to be read/writable.
 Is there an easy way to do this?

For Linux  Windows, why not put it in the same location as the pd settings 
file?

On OSX, I'd put it in ~/Library/Application Support/pd (or pd-extended).


Dan Wilcox
@danomatika
danomatika.com
robotcowboy.com





___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management - 
http://lists.puredata.info/listinfo/pd-list