On Wed, 2008-07-02 at 16:41 +0100, Martyn Russell wrote: > Hi all, > > So as part of finishing up the indexer-split branch to try and get > things into a state to begin merging, I have been looking at the config > options we have and checking we implement them and if we don't finding > out if we need them. > > Here are the current config options and some questions/comments about them. > > > WORKING OPTIONS: > ================ > > • Verbosity > • Initial Sleep > • Enable Indexing > • Min Word Length > • Max Word Length > • Language > • Enable Stemmer > • Max Bucket Count > • Min Bucket Count > • Enable Xesam > > > NOT WORKING OPTIONS: > ==================== > > • Low Memory Mode > > This option currently has no effect in the indexer-split branch. > In TRUNK, it is used to: > > 1. Set the cache size to 1/2 of what it is normally when loading DBs > 2. Set the array in update_word_table() to be 1/2 size. > 3. It affects these variables (which are usually 1/2 in low mem mode): > > a) tracker->memory_limit = 16000 *1024; > > b) tracker->max_process_queue_size = 5000; > c) tracker->max_extract_queue_size = 5000; > > d) tracker->word_detail_limit = 2000000; > e) tracker->word_detail_min = 0; > f) tracker->word_count_limit = 500000; > g) tracker->word_count_min = 0; > > For #1, I think this makes sense to reimplement agreed
> For #2, I think this is pointless if the array grows I guess > For #3a, The memory limit is used to know when to flush the word cache. > This needs reimplementing in the indexer. yes > For #3b, The process queue size is used to know how big the files queue > can get before it should be processed in the database. This is done now > by the indexer and I am not sure it is pertinent any longer. could get away without this > For #3c, This is the same as #3b. this aint used > For #3d, This is unused in TRUNK. > For #3e, This is unused in TRUNK > For #3f, This is unused in TRUNK. > For #3g, This is unused in TRUNK we need to limit no of hits per word if we are to use stack allocated arrays - however I think this is done elsewhere using a #define in the code so those vars are likely no longer needed > > • NFS Locking > > Do we need this? What is it for - as far as I can see, it is just some > simple locking mechanism using a file on the disk. What needs this? Can > we remove it? no - we need to make sure on NFS that only one indexer can be launched at any one time per user (note different session bus so cant use dbus locking) > > • Watch Directory Roots > • Crawl Directory > • No Watch Directory > • No Index File Types > > These closely map to the .module files. I would like to rename them to > map exactly so they are obviously an override or addition to the > non-user space config of each module. What are your thoughts here? thats fine > > I would like to rename "WatchDirectoryRoots". Everyone, even GIO uses > "monitor", instead of "watch" and you can supply a list so it isn't just > one. Also, should we have ANOTHER option like we do in the module files > right now to be able to set "MonitorRecursiveDirectories" and > "MonitorDirectories"? We assume they are always recursive right now. thats fine so long as we provide an upgrade path for all changed > > I would like to rename "CrawlDirectory". This needs integrating with the > .module files. > > I would like to rename "NoWatchDirectory". This is currently working. > > • Enable Watching > > I would like to rename this to "EnableMonitors" > > • Throttle > > This needs reimplementing in the indexer. Right now, we don't really > need it - at least my machine copes fine without it, but I think it > might be a good idea to add that back. > yes pls - laptops can get very hot (and with noisy fans too) so some scaling is needed > • Enable File Content Indexing > • Enable Thumbnails > > These need implementing. Plus it would be nicer to call > "EnableThumbnails", "EnableThumbnailIndexing", more consistent. I am > assuming these will both be implemented in the indexer. yes the former disables text indexing of files but allows metadata indexing only > > • Fast Merges > > Carlos is currently working on a solution which means we won't need this > option or to write to separate files temporarily before writing to the > main index. How do you feel about removing this option? dunno - ext/3 is so shite with fsync being able to avoid fsyncs would be nice but cannot be done without hogging disk when doing large writes > > • Battery Index > • Battery Index Initial > • Low Disk Space Limit > • Index Mounted Directories > • Index Removable Media > > These need some final testing and fixing up. > > • Index Email Client > > This has been removed since the .module files mean we don't need this now. > > • Max Text To Index > > This is not used in trunk, can we remove it? must be used - we should limit text to 1mb by default otherwise gigantic indexes could result with large files > > • Max Words To Index > > We should probably use this, it isn't used right now. as above > > • Optimization Sweep Count > > This is not used in trunk, can we remove it? for now yes > > • Divisions > > This was used in TRUNK to call dpoptimize(). Is this really necessary as > an option? We don't use it in the indexer-split branch yet. > no stick with defaults > • Bucket Ratio > > We need to readd this to the indexer-split branch. Unless you think it > is unimportant? stick with defaults > > • Padding > > This isn't used in TRUNK, can we remove it? stick with defaults > > • Thread Stack Size > > This is not used now because we don't create threads. > > > CONCLUSION: > =========== > > The idea is to get these options working or removed and once that's done > we can hopefully merge to TRUNK pending a big review from Jamie of course. > > One other option we have considered, is adding a config version number, > so we know if we ever have to upgrade config files the migration path > needed. What are your thoughts on this? > might be needed jamie _______________________________________________ tracker-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/tracker-list
