On Tue, 2008-03-04 at 15:19 +0100, Carlos Garnacho wrote: > Hi!, > > On Fri, 2008-02-29 at 09:10 -0500, Jamie McCracken wrote: > > On Fri, 2008-02-29 at 11:34 +0100, Carlos Garnacho wrote: > > > Hi!, > > > > > > I've attached a patch in bug #519337 to keep the extractor alive between > > > operations. This greatly improves performance, as it avoids having to > > > spawn/initialize the extractor constantly for each new file. With the > > > patch, the extractor shuts down by itself after 30 seconds of > > > inactivity, any testing is appreciated. > > > > > > Besides, I've been thinking a bit in this subject. Right now trackerd > > > waits synchronously for the metadata extractor output (and the same > > > happens for thumbnailing, even when such data isn't immediately > > > necessary), so only 1 file is processed at the same time. > > > > > > Has there been any thinking/work on making that parallelizable? I'm sure > > > there'd be performance improvements if there was a pool of extractors > > > which asynchronously processed a queue of filenames. > > > > > > > yeah although its tricky with threads (synchronisation and deadlock > > issues) > > I didn't plan to use threads here, I've developed a small test extractor > [1] that spawns several extractors and manages them asynchronously > through watches, it requires the patched tracker-extractor from bug > #519337. You can run it with: > > ./test-extract [num-extractors] [path-to-extract] > > Being a test, it just gets metadata from mp3 files, but the > tracker-extractor-pool.[ch] files can be easily adapted to tracker > needs.
bear in mind tracker is a differential indexer so when indexing a new file we need all the metadata before saving it - we must not index partially and then complete later on as thats inefficient with our design and would prolong the sqlite transactions which would prevent searches from running It would make things a lot more complex unless I have misunderstood your plans? because we want to index lots of docs within an sqlite transaction its likely we wont use threads for indexing in any event (as the threads would block each other as sqlite blocks read and writes from others when in a transaction). We could get round this by only having one thread do the saving to sqlite but it adds more complexity and more potential memory usage from queueing up the docs with their metadata > > <snip> > > > > anyway to cut a long story short, daemonizing tracker-extract is not > > the > > way to go but rather to embed common and reliable (Eg not crash prone) > > formats in a tracker-file-indexer daemon. It should use dbus of course > > for flexibility. It could be threaded as it would be less complex than > > trackerd is at the moment > > What would be the criteria for marking a extractor as reliable? I'd be > extra-careful there, extractors deal with unknown data. Also, threading > brings other complexities, like the underlying libraries not being > thread-safe, having extractors that resort to command line calls not > thread aware at all, etc... AFAIK gstreamer is threadsafe and only music files would probably be done in-process all other formats would have to be done out of process although it would be nice to do images in-process (as music and image files are the most likely to be present in large numbers) jamie _______________________________________________ tracker-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/tracker-list
