Edward Duffy wrote:
> This isn't quite worked out, but I want to throw this out to the group
> and get some preliminary feedback.  Attached is a patch that allows us
> to index system-wide and user installed man pages, Tomboy notes, and
> some basic Liferea support.  The external services all use the
> out-of-process mechanism used by the text filter and embded metadata
> extractor.  However, there are more operations, and therefore, more
> applications for each service.

it would be best to discuss this first before doing the patch (unless 
you are content to modify it quite a bit - which is fine!)

I like this in general but there are a few things:

I want it to work with third party packages so it needs to have easy 
installation and deinstallation


> 
> First, the directory structure:
> in tracker/src  there now resides an "external-services" directory.
> In this directory you will find one directory for each service.  The
> service directories are named after their configuration key in
> ~/.Tracker/tracker.cfg.  This makes it easy to add new services with
> out recompiling trackerd (and hopefully encourage other developers to
> provide tracker support with their apps!).  For example, you'll find
> the directory tracker/src/external-services/IndexManPages and a
> IndexManPages key under the Services group in tracker.cfg.  Each
> service has five programs:

I would prefer just "services" to "external-services"

> 
> 1) check-deps
>  This program is called in the very begining, if the user actives the
> service's key.  This program may check for any other required programs
> that is needed for this service to work.  For example, I check for
> xsltproc and w3m for the Liferea indexer.  If non-zero is returned,
> the indexer is disabled.

Im not sure this is needed but I suppose there's no harm in having it 
but it should be optional

> 
> 2) watch-list
>  This program returns a list of directories to be added to trackerd's
> watch list.  You must list each directory, it will not automatically
> recurse all subdirectorys.  If you need all subdirs, I recommend find:
> # find $basedir -type d
> See IndexManPage/watch-list for an example.

not needed - I prefer to have a service file (like the dbus service 
files or .Desktop files which specify the options needed here).

All we need is a directory like /usr/share/tracker/services to hold the 
service files. This makes it easy for seperate packages to install and 
de-install stuff without any hassle.

At start up trackerd can simply read all these service files (+ also 
watch for new ones too!)

> 
> 3) service-type
> This progam returns the service type of a file being watched by this 
> service.
> argv[1] == the full path to the file being watched
> argv[2] == the mime type of the file
> I provides the file path and mime, if you need it, but I imagine this
> should be constant

not needed - any file in the watch directory above would be passed to 
the spawned service-handler (we can include globs in the service file to 
filter certain files to pass)

generally these watched folders will all be in hidden folders (usually) 
so they wont conflict with the file indexer.

> 
> 4) filter-text
> This works very similar to the text filters you find in the
> tracker/filters directory, except
> argv[1] == the full path
> argv[2] == the mime type of the file !!
> argv[3] == the path to the filtered text !!
> 
> 5) extract-metadata
> Again, behaves like tracker-extract. It takes a file and splits out
> Key=Value;\n pairs for each piece of metadata
> argv[1] == the full path
> argv[2] == the mime type of the file
> 

I was planning on migrating the existing metadata extractors format to 
an xml format (our current one is quite hacky!). We also need to handle 
multiple values for the same metadata type.

something like:

<extraction>
        <metadata name="Audio.Title">Moonlight Sonata</metadata>
        <metadata name="Audio.Artist">Beethoven</metadata>
</extraction>

Feel free to modify code to match above.

the filter program and metadata extractor program should be specified in 
the service file so there's no need to worry about mimes.

We need a function in tracker-utils that determines if a file is 
associated with a particular service by looking at its path and matching 
it against any path thats registered as a watch by a service. We need 
this for the emails so may as well reuse it for all services. (just 
needs to call g_str_has_prefix on it)

> 
> So, like I said before, I'm including 3 implementations of this:
> 
> 1) IndexManPages
> The new service type is "Man Pages" and it adds a new "Man" metadata
> class.  The class can tag a man page's title, section, date it was
> written, source (app + version), and manual name (eg, Debian Project
> for debian specific man pages).  It also provides a full text indexer.
> Only thing lacking here is the language the man page was written in.
> Currently, I reject any non-english directory.  It's easy to index
> them all, but it's just faster for me if trackerd just ignores those.

ok great. Maybe we can use user's locale to work out which translations 
to index?

> 
> 2) IndexTomboy
> This uses the Notes service type, and adds a Title field to the Note
> metadata class.  There's obviously more I could grab from the tomboy
> files, I just haven't gotten around to it yet.  Full text is
> supported.

see Mikkel's tomboy indexer which he sent on this list last month - it 
does all the fields I believe. Perhaps you could use some of his code?

> 
> 3) IndexLiferea
> This adds a service type called "Web Channels" and a metadata class
> "RSS".  This indexer sucks and I need some help on it. :(
> Currently, you only get one entry in the database for each feed.  So
> all the text in the feed is associated with the entire feed, instead
> of an individual item.  For example, if I was to search for "tracker"
> I'd expect a link to a specific post by Jamie, instead I get a link
> planet gnome.  I'm not even sure what I need here, I'd like some way
> to associate a file with multiple database items.  Is this possible?
> 

not sure - will have to think. The xml above for the extractor could be 
modified to support multiple sub-entities with their own uri in one go.

<extraction>
        <Entity uri="/home/jamie/music/moonlight.ogg">
                <metadata name="Audio.Title">Moonlight Sonata</metadata>
                <metadata name="Audio.Artist">Beethoven</metadata>
        </entity>

        <Entity uri="/home/jamie/music/moonlight.ogg">
                <metadata name="Audio.Title">Moonlight Sonata</metadata>
                <metadata name="Audio.Artist">Beethoven</metadata>
        </entity>
</extraction>


so in the DB, they should be separate objects "RSS" feed and "RSS Item"

You could also build the uri to include the rss file and an offset to 
the item that matches and the gui can then decode it and show a viewer 
for it


> I'm pretty happy with the man pages indexer, I may look into having
> Yelp use some time in the future.  But I'm not calling dibs, so anyone
> else looking for an project to work on is more than welcome.
> 
> The tomboy indexer works as expected also.  I belive Tomboy is
> dbus-ified, so if any one wants to update tracker-search-tool to
> search Notes also and fire up with Tomboy when you click on a note,
> that'd be awsome.

the service file can contain this - either an exec name or a dbus 
interface/object name

Sample service file might look like:

[Service]
Type=Notes
WatchDirs=$HOME/.tomboy
WatchRecursive=false
WatchFilter=

[Metadata]
Exec=/usr/bin/tomboy-extractor

[TextFilter]
Exec=

[Display]
Exec=/usr/bin/tomboy


Any comments?

-- 
Mr Jamie McCracken
http://jamiemcc.livejournal.com/

_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to