I've noticed when indexing *large* amounts of data that a lot of disk 
thrashing is taking place which is greatly slowing down performance of 
both tracker and the system in general.

Also the nice +10 is not throttling enough (I dont have ionice in my 
kernel so I dont know how good a job that does) so I will probably add 
some sleeping intervals to smooth things out and keep cpu usage low 
(with a --turbo command line option to disable this for those that want 
faster indexing)

The cause of the slow down is heavy fragmentation of the file based hash 
table.

Having indexed 30GB of stuff, the optimization routine shrank the full 
text index from nearly 300MB to 20MB which means a massive 280MB of 
fragmentation had occurred - this is obscene!

I note other indexers do not update the hash table directly but cache 
the data in memory and then bulk upload it to reduce fragmentation and 
lessen the performance hit. The disadvantage of this is searches for 
newly indexed content wont appear until the cache is uploaded to the 
hash table. (we could upload every 10-15 mins or something - infrequent 
words should be updated more quickly though)

As we are memory conservative, I am planning to do something similiar 
but using sqlite (instead of precious memory) to cache new files and 
then bulk upload. We could easily cache the data for many thousands of 
files before uploading them.

We can actually do better than others here because firstly we are not 
using any more RAM so can therefore have much bigger caches and secondly 
unlike other indexers which upload all at once (which often causes a cpu 
spike) we can do it incrementally in sqlite.

And no sqlite will not fragment as its btree based and not a hash table 
(btrees are much faster to update then hashes) and we will use a 
seperate db file which can be deleted when finished.

Will be experimenting on this tonight. There will be a few race 
conditions to handle with this but its nothing too complex.

I am determined to get tracker running as smooth as a baby's bottom!


-- 
Mr Jamie McCracken
http://jamiemcc.livejournal.com/

_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to