On 29/02/2008, Jamie McCracken <[EMAIL PROTECTED]> wrote: > > > On Fri, 2008-02-29 at 11:34 +0100, Carlos Garnacho wrote: > > Hi!, > > > > I've attached a patch in bug #519337 to keep the extractor alive between > > operations. This greatly improves performance, as it avoids having to > > spawn/initialize the extractor constantly for each new file. With the > > patch, the extractor shuts down by itself after 30 seconds of > > inactivity, any testing is appreciated. > > > > Besides, I've been thinking a bit in this subject. Right now trackerd > > waits synchronously for the metadata extractor output (and the same > > happens for thumbnailing, even when such data isn't immediately > > necessary), so only 1 file is processed at the same time. > > > > Has there been any thinking/work on making that parallelizable? I'm sure > > there'd be performance improvements if there was a pool of extractors > > which asynchronously processed a queue of filenames. > > > > > yeah although its tricky with threads (synchronisation and deadlock > issues) > > The plan for 0.7 is to split trackerd into : > > 1) Always active main daemon that does watching and processes search > requests > > 2) tracker-file-indexer - called by (1) via dbus to index files. Nice > +19 and ioniced. Exits when indeixng complete. Dbus activated when > crashed or new stuff to index comes about > > 3) tracker-email-indexer - called by (1) to index emails. same as (2). > File attachemnts would need to be handled by similar code to (1) which > is disadvantageous though > > 4) xesam extractors - some extractors can be built into (1) and (2) so > as to become a daemonised extractor others will be specified by xesam > and called out of process by (1) > > 5) xesam crawlers - as (4) but for containerised objects like news feeds > > > The above would be faster and much more leaner on memory as memory > consumed by indexing would be released when indexing has finished. It > should be more maintainable and less complex than a monolithic trackerd > > there would also need to be private shared libs for the above components > to enhance code reuse > > the xesam stuff would easily allow 3rd party extractors and crawlers to > be implemented > > anyway to cut a long story short, daemonizing tracker-extract is not the > way to go but rather to embed common and reliable (Eg not crash prone) > formats in a tracker-file-indexer daemon. It should use dbus of course > for flexibility. It could be threaded as it would be less complex than > trackerd is at the moment > > Designing the above will be tricky but should go hand in hand with > refactoring. If thats somehting you or others want to work on then we > should discuss on IRC
Jamie if you have more in depth design ideas it would be a good idea to post them on the Xesam ml. Specifically about the shared Xesam metadata extractors and crawlers. There has not been much concrete discussion on these topics. Cheers, Mikkel
_______________________________________________ tracker-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/tracker-list
