Hi Philip,

On Fri, Jun 14, 2013 at 7:23 AM, Philip Van Hoof <phi...@codeminded.be>wrote:

> Hi team,
>
> During a Tracker/Nepomuk/SPARQL training I gave at one of my customers I
> noted the interest in extractors that can dive into archives and document
> types that have a tree of other documents (like MIME documents).
>

 Just today another message in this mailing list was mentioning it :)


> That or libtracker-extract should allow a stream or buffer based
> extraction, and/or a file descriptor based one (in which case we could pass
> the extractor modules, the ones now only used by tracker-extract, a by pipe
> created FD from the E-mail client, and write the Base64 decoded data to the
> pipe FD - or something). Unfortunately is tracker-extract right now
> entirely FILE based (not really FD based, nor stream based).
>

 FD passing and buffered extraction are both good ideas. They are also
independent. We could implement any of them without the other.


>
> I think it would be a great first addition if the tracker-extract .rule
> file based environment would be adapted to have two levels of matching:
> first on container and then on MimeType. The first level would for all of
> its native extractors be "Just File", and for the libstreamanalyzer's be
> "MIMEDocument" and "Archive". The second level would be the same as now.
> Ideally this level system could also be used for multimedia files (videos
> have first a MIME type and then a codec type, for example).
>

Is this two level matching really needed? at the end we recognize the
containers with mime-types (e.g. application/x-tgz). With the current
.rules files, we can assign those "container mime-types" to the topanalyzer.


>
> Then would it start being possible for a extractor module like
> tracker-topanalyzer.cpp to get kicked into action for diving into archive
> files and MIME documents (and the native ones would still operate on native
> file types).
>
> Also should the tracker-topanalyzer.cpp be fixed. It has been a long time
> that it was last tested and I don't expect it to still work. And for it to
> work it would probably be needed that libstreamanalyzer gets adapted to
> follow Tracker's Nepomuk adaptations (right now libstreamanalyzer doesn't
> know about the nmm ontology, afaik).
>

I wonder if Jos is still working on it. We could bring back to life that
topanalyzer extractor, use it for compressed files and move on from there.

Best Regards,

Ivan

_______________________________________________
> tracker-list mailing list
> tracker-list@gnome.org
> https://mail.gnome.org/**mailman/listinfo/tracker-list<https://mail.gnome.org/mailman/listinfo/tracker-list>
>
_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to