On Fri, 2008-02-29 at 11:34 +0100, Carlos Garnacho wrote: > Hi!, > > I've attached a patch in bug #519337 to keep the extractor alive between > operations. This greatly improves performance, as it avoids having to > spawn/initialize the extractor constantly for each new file. With the > patch, the extractor shuts down by itself after 30 seconds of > inactivity, any testing is appreciated. > > Besides, I've been thinking a bit in this subject. Right now trackerd > waits synchronously for the metadata extractor output (and the same > happens for thumbnailing, even when such data isn't immediately > necessary), so only 1 file is processed at the same time. > > Has there been any thinking/work on making that parallelizable? I'm sure > there'd be performance improvements if there was a pool of extractors > which asynchronously processed a queue of filenames. >
yeah although its tricky with threads (synchronisation and deadlock issues) The plan for 0.7 is to split trackerd into : 1) Always active main daemon that does watching and processes search requests 2) tracker-file-indexer - called by (1) via dbus to index files. Nice +19 and ioniced. Exits when indeixng complete. Dbus activated when crashed or new stuff to index comes about 3) tracker-email-indexer - called by (1) to index emails. same as (2). File attachemnts would need to be handled by similar code to (1) which is disadvantageous though 4) xesam extractors - some extractors can be built into (1) and (2) so as to become a daemonised extractor others will be specified by xesam and called out of process by (1) 5) xesam crawlers - as (4) but for containerised objects like news feeds The above would be faster and much more leaner on memory as memory consumed by indexing would be released when indexing has finished. It should be more maintainable and less complex than a monolithic trackerd there would also need to be private shared libs for the above components to enhance code reuse the xesam stuff would easily allow 3rd party extractors and crawlers to be implemented anyway to cut a long story short, daemonizing tracker-extract is not the way to go but rather to embed common and reliable (Eg not crash prone) formats in a tracker-file-indexer daemon. It should use dbus of course for flexibility. It could be threaded as it would be less complex than trackerd is at the moment Designing the above will be tricky but should go hand in hand with refactoring. If thats somehting you or others want to work on then we should discuss on IRC jamie _______________________________________________ tracker-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/tracker-list
