Op 13/06/2013 1:22, Ivan Frade schreef:
Hi Ivan,
Some other ideas, if your use cases are limited:
You could disable the indeces you dont need. They use some space in
sqlite.
True. Minizing the usage of tracker:indexed and especially
tracker:domainIndex is a good idea to reduce storage. Although this will
have a serious impact on performance of queries using the fields. So I
wouldn't recommend this for everybody.
For some properties, we store its value and collation to sort
correctly in different locales. If you don't need that sorting, you
could remove this duplication.
Correct. I almost forgot about that one. This will, however, mean that
it's not possible to sort correctly on that field anymore? Ideally if we
remove the collation column we can still sort correctly but then only
slower. Afaik that should be possible and/or is already the case, no?
You could also prune the extractors to get *only* the information you
need... specially text properties.
Right. I wonder if it's worthwhile to try to make this possible for
upstream by having it configurable per extractor. For example in the
.rule file of an extractor module we could specify which properties to
extract (if they are available), and then having some infrastructure to
avoid huge amounts of if-then-else in the extractor modules' code.
Tanks for the tips, especially the one about the collator column which I
had forget about myself.
Kind regards,
Philip
On Wed, Jun 12, 2013 at 2:50 AM, Martyn Russell <mar...@lanedo.com
<mailto:mar...@lanedo.com>> wrote:
On 12/06/13 09:00, Philip Van Hoof wrote:
HI guys,
Hello Philip,
For one of my customers I'm getting the question how to reduce
the disk
usage.
Do you have a requirement here?
How much are you looking to reduce it by?
What is it now?
What are your limits, etc?
I wrote the journalling and periodic backup of meta.db myself
so I of
course know how to disable these, what the consequences are
and how to
ensure that all still works and all that ;)
My question to the team is to think with me on how we can
further reduce
disk space usage for products where this is a consideration
(for example
embedded appliances where additional storage is an expensive
component
if it has to be large).
Next to disabling journaling and using synchronous mode in
SQLite after
putting meta.db in .local and adapting the Backup/Restore to
operate on
the main meta.db instead of the journal or periodic backup, I was
thinking about disabling fts, but also disabling extracting
and mining
of nie:plainTextContent.
Absolutely, this should make quite some difference to the DB size.
./configure --disable-tracker-fts
I would start here.
But also a perhaps crazy idea would be to implement a virtual
table for
SQLite that can compress certain literals' columns. A kind of the
opposite of a indexed property: it'll be very slow, but as it
is rarely
queried on it's fine that it is slow. Just that the property's
value
must still be stored for the times when it is needed.
Do you have a real use case in mind here?
For example for properties like nie:plainTextContent, but then per
resource would the cell be stored compressed or not (and all
SQLite
access to it would decompress it, for example collation would).
The problem is that many users want nie:plainTextContent to be
there,
but they don't want it to consume so much diskspace (and it
can be slow
to access it).
Another idea could be filesystem specific: pointing in SQLite,
somehow,
to the inode of the FS straight to the contents of the file
whenever the
file is a plain text one. This might be even more crazy. I
don't know.
You may sacrifice speed here, we would also need to consider how
to cater for cases where the file is not tracker:available of course.
Putting all of meta.db on a compressed filesystem is also an idea.
We need more information about what you're limits are first I
would say.
--
Regards,
Martyn
Founder and CEO of Lanedo GmbH.
_______________________________________________
tracker-list mailing list
tracker-list@gnome.org <mailto:tracker-list@gnome.org>
https://mail.gnome.org/mailman/listinfo/tracker-list
_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list
_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list