Re: [Tracker] Disk usage optimization

Philip Van Hoof Fri, 14 Jun 2013 14:25:22 -0700

Op 14/06/2013 20:13, Ivan Frade schreef:

Hi Philip,
My ideas are more in the line of "fine tune Tracker for your specificuse case". Don't think they apply to master.

Hmm, Given that Tracker is fit and designed for embedded use-cases Idon't think the project should not allow compile time and/or configuretime configurability of behaviour.

For example the --disable-journal and the --disable-fts are alsobehaviour changes. We could equally easily add a--disable-collator-column and a --disable-plaintext-extraction so thatsystem integrators can easily build a Tracker package that is optimizedfor storage instead of optimized for performance.

For longer term future I'd even go as far as to easily allow replacingthe entire ontology. Although I think for that we should rather bringlibtracker-sparql and tracker-store together as libsparql-store, have asemantic-nepomuk-desktop package that installs the ontology and letlibtracker-miner and tracker-miner-fs be packages that depend onsemantic-nepomuk-desktop and libsparql-store.

And then on tracker-miner-fs have a --disable-plaintext-extraction andon libsparql-store have a --disable-journal, --disable-fts and--disable-collator-column.

This would effectively mean so-called splitting the project. But I'vealways felt that in the long term this should happen. It would alsoallow tracker-miner-fs to focus more on the mining and indexing offiles, and libsparql-store on being a embedded and/or highly efficientand reliable SPARQL endpoint and SPARQL INSERT store.

I also think that libtracker-extract should probably move towards atruly publicly usable libmetadata-extract which exposes buffer andstream based metadata extraction for not just tracker-extract but forany program that needs this. Although it would use thislibmetadata-extract just like how it uses libtracker-extract now, thetracker-extract binary should be an implementation detail oftracker-miner-fs after that. The problem I see with the currentarchitecture of tracker-extract as the service to do metadata extractionis that it can only work well for file based metadata extraction, whilethe world of metadata is massively, insanely massivele larger than justfiles on your filesystem. If you just open your eyes to see it.

On Fri, Jun 14, 2013 at 6:39 AM, Philip Van Hoof <phi...@codeminded.be<mailto:phi...@codeminded.be>> wrote:


    Op 13/06/2013 1:22, Ivan Frade schreef:

    Hi Ivan,

    For some properties, we store its value and collation to sort
    correctly in different locales. If you don't need that sorting,
    you could remove this duplication.
    Correct. I almost forgot about that one. This will, however, mean
    that it's not possible to sort correctly on that field anymore?
    Ideally if we remove the collation column we can still sort
    correctly but then only slower. Afaik that should be possible
    and/or is already the case, no?

Without the collation the order can be wrong in some locales. It isnot about speed, IIRC.

Can it be made to be correct without the collation column? Surely thecollation column got created out of the same data the current property'scolumn stores? Meaning that collation data can be made in alloca()buffers on the fly (which is of course going to be a lot slower).


Kind regards.

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] Disk usage optimization

Reply via email to