Hi Philip, On Fri, Jun 14, 2013 at 7:23 AM, Philip Van Hoof <phi...@codeminded.be>wrote:
> Hi team, > > During a Tracker/Nepomuk/SPARQL training I gave at one of my customers I > noted the interest in extractors that can dive into archives and document > types that have a tree of other documents (like MIME documents). > Just today another message in this mailing list was mentioning it :) > That or libtracker-extract should allow a stream or buffer based > extraction, and/or a file descriptor based one (in which case we could pass > the extractor modules, the ones now only used by tracker-extract, a by pipe > created FD from the E-mail client, and write the Base64 decoded data to the > pipe FD - or something). Unfortunately is tracker-extract right now > entirely FILE based (not really FD based, nor stream based). > FD passing and buffered extraction are both good ideas. They are also independent. We could implement any of them without the other. > > I think it would be a great first addition if the tracker-extract .rule > file based environment would be adapted to have two levels of matching: > first on container and then on MimeType. The first level would for all of > its native extractors be "Just File", and for the libstreamanalyzer's be > "MIMEDocument" and "Archive". The second level would be the same as now. > Ideally this level system could also be used for multimedia files (videos > have first a MIME type and then a codec type, for example). > Is this two level matching really needed? at the end we recognize the containers with mime-types (e.g. application/x-tgz). With the current .rules files, we can assign those "container mime-types" to the topanalyzer. > > Then would it start being possible for a extractor module like > tracker-topanalyzer.cpp to get kicked into action for diving into archive > files and MIME documents (and the native ones would still operate on native > file types). > > Also should the tracker-topanalyzer.cpp be fixed. It has been a long time > that it was last tested and I don't expect it to still work. And for it to > work it would probably be needed that libstreamanalyzer gets adapted to > follow Tracker's Nepomuk adaptations (right now libstreamanalyzer doesn't > know about the nmm ontology, afaik). > I wonder if Jos is still working on it. We could bring back to life that topanalyzer extractor, use it for compressed files and move on from there. Best Regards, Ivan _______________________________________________ > tracker-list mailing list > tracker-list@gnome.org > https://mail.gnome.org/**mailman/listinfo/tracker-list<https://mail.gnome.org/mailman/listinfo/tracker-list> >
_______________________________________________ tracker-list mailing list tracker-list@gnome.org https://mail.gnome.org/mailman/listinfo/tracker-list