Hey all, I've talking about this branch on #tracker, but now that most work is done there it is worth raising to the ML. In that branch there are two extra objects in libtracker-miner:
* TrackerDecorator is a TrackerMiner that implements a passive indexing pattern, instead of being expected to feed data directly to tracker-store, it listens for GraphUpdated signals, so when an item eligible for indexing is added/updated and is still missing a nie:dataSource specific to the decorator, it is queued for processing. On startup it also queries for all elements of the eligible rdf:types that are missing that nie:dataSource, so all elements are ensured to be indexed. * TrackerDecoratorFS is a file-specific implementation of that object, which basically adds volume monitoring, so indexing within just added volumes is resumed if interrupted previously, or having the elements removed from the queue if the volume is removed. In that branch, tracker-extract does use these features, it is been turned into a full-blown standalone miner using TrackerDecorator, while miner-fs stopped calling it. On one hand, this leads to a greatly simplified indexing in tracker-miner-fs, as the task is a lot less prone to failure now. On the other hand, this brings in the 2-pass indexing that was being requested, miner-fs promptly crawls and fetches GFile info, and tracker-extract goes soon after filling in extra information. Current caveats =============== It is worth noting though that in the branch not much has been done yet about handling extraction failures: * extractor modules blocking or taking too much time * crashes in extractor modules Possible solutions go through adding cancellability of extract tasks and/or having all extraction go into a subprocess that we can watch on, so the dbus service itself doesn't go away and doesn't need to be restarted. The latter could also help with Phillip's idea to run extraction in containers. But about these changes... Future plans? ============= I'm very seriously proposing to make libtracker-extract private altogether, the usefulness of having 3rd party extractors is dubious, as neither allowing them to reimplement extraction for a famous mimetype nor implementing support for a mimetype we don't know well enough is positive, it potentially affects tracker stability and user perception, and helps avoid the point that if a mimetype has enough traction, it should be in the tracker tree. Its API is also a mishmash of utility functions that have little to do with the rest of Tracker, and written in not a quite future-safe way. Moreover, goggling for "tracker_extract_get_metadata" (the function that modules must implement), I just see 3 pages of references to Tracker code, backtraces, and logs, very little references to external extractors. This API is 1/3 of the Tracker public API, yet it's been mostly unused externally for the 3 years it's been on. So, I think Tracker should offer API to help integrate with Tracker, as such this API falls over, I propose to keep it in private land, and encourage the use of TrackerDecorator, which is also nice in the way that multiple sources add up information, unlike extract modules which are individually responsible of filling in every piece of information. Actually, I'd like to think we can make 1.0 soon (we technically could ASAP, we've remained feature stable for quite some time now) and make longer stability promises than we do currently (having every gnome module depending on Tracker bump .pc file versions every 6 months is a PITA), IMO the main milestone is getting the API to a point where we can think of forward compatibility, and doing this would help greatly. Phew, long email, Carlos _______________________________________________ tracker-list mailing list tracker-list@gnome.org https://mail.gnome.org/mailman/listinfo/tracker-list