Re: [Tracker] [PATCH] "Daemonize" metadata extractor

Jamie McCracken Tue, 04 Mar 2008 07:36:16 -0800

On Tue, 2008-03-04 at 15:19 +0100, Carlos Garnacho wrote:
> Hi!,
> 
> On Fri, 2008-02-29 at 09:10 -0500, Jamie McCracken wrote:
> > On Fri, 2008-02-29 at 11:34 +0100, Carlos Garnacho wrote:
> > > Hi!,
> > > 
> > > I've attached a patch in bug #519337 to keep the extractor alive between
> > > operations. This greatly improves performance, as it avoids having to
> > > spawn/initialize the extractor constantly for each new file. With the
> > > patch, the extractor shuts down by itself after 30 seconds of
> > > inactivity, any testing is appreciated.
> > > 
> > > Besides, I've been thinking a bit in this subject. Right now trackerd
> > > waits synchronously for the metadata extractor output (and the same
> > > happens for thumbnailing, even when such data isn't immediately
> > > necessary), so only 1 file is processed at the same time. 
> > > 
> > > Has there been any thinking/work on making that parallelizable? I'm sure
> > > there'd be performance improvements if there was a pool of extractors
> > > which asynchronously processed a queue of filenames.
> > > 
> > 
> > yeah although its tricky with threads (synchronisation and deadlock
> > issues)
> 
> I didn't plan to use threads here, I've developed a small test extractor
> [1] that spawns several extractors and manages them asynchronously
> through watches, it requires the patched tracker-extractor from bug
> #519337. You can run it with:
> 
> ./test-extract [num-extractors] [path-to-extract]
> 
> Being a test, it just gets metadata from mp3 files, but the
> tracker-extractor-pool.[ch] files can be easily adapted to tracker
> needs.


bear in mind tracker is a differential indexer so when indexing a new
file we need all the metadata before saving it - we must not index
partially and then complete later on as thats inefficient with our
design and would prolong the sqlite transactions which would prevent
searches from running

It would make things a lot more complex unless I have misunderstood your
plans?

because we want to index lots of docs within an sqlite transaction its
likely we wont use threads for indexing in any event (as the threads
would block each other as sqlite blocks read and writes from others when
in a transaction). We could get round this by only having one thread do
the saving to sqlite but it adds more complexity and more potential
memory usage from queueing up the docs with their metadata

> 
> <snip>
> > 
> > anyway to cut a long story short, daemonizing tracker-extract is not
> > the
> > way to go but rather to embed common and reliable (Eg not crash prone)
> > formats in a tracker-file-indexer daemon. It should use dbus of course
> > for flexibility. It could be threaded as it would be less complex than
> > trackerd is at the moment
> 
> What would be the criteria for marking a extractor as reliable? I'd be
> extra-careful there, extractors deal with unknown data. Also, threading
> brings other complexities, like the underlying libraries not being
> thread-safe, having extractors that resort to command line calls not
> thread aware at all, etc...

AFAIK gstreamer is threadsafe and only music files would probably be
done in-process

all other formats would have to be done out of process although it would
be nice to do images in-process (as music and image files are the most
likely to be present in large numbers)

jamie


_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] [PATCH] "Daemonize" metadata extractor

Reply via email to