Op 13/06/2013 1:22, Ivan Frade schreef:

Hi Ivan,

Some other ideas, if your use cases are limited:

You could disable the indeces you dont need. They use some space in sqlite.

True. Minizing the usage of tracker:indexed and especially tracker:domainIndex is a good idea to reduce storage. Although this will have a serious impact on performance of queries using the fields. So I wouldn't recommend this for everybody.

For some properties, we store its value and collation to sort correctly in different locales. If you don't need that sorting, you could remove this duplication.
Correct. I almost forgot about that one. This will, however, mean that it's not possible to sort correctly on that field anymore? Ideally if we remove the collation column we can still sort correctly but then only slower. Afaik that should be possible and/or is already the case, no?

You could also prune the extractors to get *only* the information you need... specially text properties.
Right. I wonder if it's worthwhile to try to make this possible for upstream by having it configurable per extractor. For example in the .rule file of an extractor module we could specify which properties to extract (if they are available), and then having some infrastructure to avoid huge amounts of if-then-else in the extractor modules' code.

Tanks for the tips, especially the one about the collator column which I had forget about myself.

Kind regards,

Philip





On Wed, Jun 12, 2013 at 2:50 AM, Martyn Russell <mar...@lanedo.com <mailto:mar...@lanedo.com>> wrote:

    On 12/06/13 09:00, Philip Van Hoof wrote:

        HI guys,


    Hello Philip,


        For one of my customers I'm getting the question how to reduce
        the disk
        usage.


    Do you have a requirement here?
    How much are you looking to reduce it by?
    What is it now?
    What are your limits, etc?


        I wrote the journalling and periodic backup of meta.db myself
        so I of
        course know how to disable these, what the consequences are
        and how to
        ensure that all still works and all that ;)

        My question to the team is to think with me on how we can
        further reduce
        disk space usage for products where this is a consideration
        (for example
        embedded appliances where additional storage is an expensive
        component
        if it has to be large).

        Next to disabling journaling and using synchronous mode in
        SQLite after
        putting meta.db in .local and adapting the Backup/Restore to
        operate on
        the main meta.db instead of the journal or periodic backup, I was
        thinking about disabling fts, but also disabling extracting
        and mining
        of nie:plainTextContent.


    Absolutely, this should make quite some difference to the DB size.

      ./configure --disable-tracker-fts

    I would start here.


        But also a perhaps crazy idea would be to implement a virtual
        table for
        SQLite that can compress certain literals' columns. A kind of the
        opposite of a indexed property: it'll be very slow, but as it
        is rarely
        queried on it's fine that it is slow. Just that the property's
        value
        must still be stored for the times when it is needed.


    Do you have a real use case in mind here?


        For example for properties like nie:plainTextContent, but then per
        resource would the cell be stored compressed or not (and all
        SQLite
        access to it would decompress it, for example collation would).

        The problem is that many users want nie:plainTextContent to be
        there,
        but they don't want it to consume so much diskspace (and it
        can be slow
        to access it).

        Another idea could be filesystem specific: pointing in SQLite,
        somehow,
        to the inode of the FS straight to the contents of the file
        whenever the
        file is a plain text one. This might be even more crazy. I
        don't know.


    You may sacrifice speed here, we would also need to consider how
    to cater for cases where the file is not tracker:available of course.


        Putting all of meta.db on a compressed filesystem is also an idea.


    We need more information about what you're limits are first I
    would say.

-- Regards,
    Martyn

    Founder and CEO of Lanedo GmbH.

    _______________________________________________
    tracker-list mailing list
    tracker-list@gnome.org <mailto:tracker-list@gnome.org>
    https://mail.gnome.org/mailman/listinfo/tracker-list




_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to