On Wed, 2008-07-02 at 16:41 +0100, Martyn Russell wrote:
> Hi all,
> 
> So as part of finishing up the indexer-split branch to try and get
> things into a state to begin merging, I have been looking at the config
> options we have and checking we implement them and if we don't finding
> out if we need them.
> 
> Here are the current config options and some questions/comments about them.
> 
> 
> WORKING OPTIONS:
> ================
> 
> • Verbosity
> • Initial Sleep
> • Enable Indexing
> • Min Word Length
> • Max Word Length
> • Language
> • Enable Stemmer
> • Max Bucket Count
> • Min Bucket Count
> • Enable Xesam
> 
> 
> NOT WORKING OPTIONS:
> ====================
> 
> • Low Memory Mode
> 
> This option currently has no effect in the indexer-split branch.
> In TRUNK, it is used to:
> 
>   1. Set the cache size to 1/2 of what it is normally when loading DBs
>   2. Set the array in update_word_table() to be 1/2 size.
>   3. It affects these variables (which are usually 1/2 in low mem mode):
> 
>      a)           tracker->memory_limit = 16000 *1024;
> 
>      b)           tracker->max_process_queue_size = 5000;
>      c)           tracker->max_extract_queue_size = 5000;
> 
>      d)           tracker->word_detail_limit = 2000000;
>      e)           tracker->word_detail_min = 0;
>      f)           tracker->word_count_limit = 500000;
>      g)           tracker->word_count_min = 0;
> 
> For #1, I think this makes sense to reimplement
agreed

> For #2, I think this is pointless if the array grows
I guess


> For #3a, The memory limit is used to know when to flush the word cache.
> This needs reimplementing in the indexer.
yes


> For #3b, The process queue size is used to know how big the files queue
> can get before it should be processed in the database. This is done now
> by the indexer and I am not sure it is pertinent any longer.

could get away without this 

> For #3c, This is the same as #3b.
this aint used


> For #3d, This is unused in TRUNK.
> For #3e, This is unused in TRUNK
> For #3f, This is unused in TRUNK.
> For #3g, This is unused in TRUNK

we need to limit no of hits per word if we are to use stack allocated
arrays - however I think this is done elsewhere using a #define in the
code so those vars are likely no longer needed

> 
> • NFS Locking
> 
> Do we need this? What is it for - as far as I can see, it is just some
> simple locking mechanism using a file on the disk. What needs this? Can
> we remove it?

no - we need to make sure on NFS that only one indexer can be launched
at any one time per user (note different session bus so cant use dbus
locking)

> 
> • Watch Directory Roots
> • Crawl Directory
> • No Watch Directory
> • No Index File Types
> 
> These closely map to the .module files. I would like to rename them to
> map exactly so they are obviously an override or addition to the
> non-user space config of each module. What are your thoughts here?

thats fine

> 
> I would like to rename "WatchDirectoryRoots". Everyone, even GIO uses
> "monitor", instead of "watch" and you can supply a list so it isn't just
> one. Also, should we have ANOTHER option like we do in the module files
> right now to be able to set "MonitorRecursiveDirectories" and
> "MonitorDirectories"? We assume they are always recursive right now.

thats fine so long as we provide an upgrade path for all changed

> 
> I would like to rename "CrawlDirectory". This needs integrating with the
> .module files.
> 
> I would like to rename "NoWatchDirectory". This is currently working.
> 
> • Enable Watching
> 
> I would like to rename this to "EnableMonitors"
> 
> • Throttle
> 
> This needs reimplementing in the indexer. Right now, we don't really
> need it - at least my machine copes fine without it, but I think it
> might be a good idea to add that back.
> 

yes pls - laptops can get very hot (and with noisy fans too) so some
scaling is needed

> • Enable File Content Indexing
> • Enable Thumbnails
> 
> These need implementing. Plus it would be nicer to call
> "EnableThumbnails", "EnableThumbnailIndexing", more consistent. I am
> assuming these will both be implemented in the indexer.

yes  the former disables text indexing of files but allows metadata
indexing only

> 
> • Fast Merges
> 
> Carlos is currently working on a solution which means we won't need this
> option or to write to separate files temporarily before writing to the
> main index. How do you feel about removing this option?

dunno - ext/3 is so shite with fsync

being able to avoid fsyncs would be nice but cannot be done without
hogging disk when doing large writes


> 
> • Battery Index
> • Battery Index Initial
> • Low Disk Space Limit
> • Index Mounted Directories
> • Index Removable Media
> 
> These need some final testing and fixing up.
> 
> • Index Email Client
> 
> This has been removed since the .module files mean we don't need this now.
> 
> • Max Text To Index
> 
> This is not used in trunk, can we remove it?

must be used - we should limit text to 1mb by default otherwise gigantic
indexes could result with large files

> 
> • Max Words To Index
> 
> We should probably use this, it isn't used right now.

as above 

> 
> • Optimization Sweep Count
> 
> This is not used in trunk, can we remove it?

for now yes

> 
> • Divisions
> 
> This was used in TRUNK to call dpoptimize(). Is this really necessary as
> an option? We don't use it in the indexer-split branch yet.
> 

no stick with defaults


> • Bucket Ratio
> 
> We need to readd this to the indexer-split branch. Unless you think it
> is unimportant?

stick with defaults


> 
> • Padding
> 
> This isn't used in TRUNK, can we remove it?

stick with defaults

> 
> • Thread Stack Size
> 
> This is not used now because we don't create threads.
> 
> 
> CONCLUSION:
> ===========
> 
> The idea is to get these options working or removed and once that's done
> we can hopefully merge to TRUNK pending a big review from Jamie of course.
> 
> One other option we have considered, is adding a config version number,
> so we know if we ever have to upgrade config files the migration path
> needed. What are your thoughts on this?
> 
might be needed

jamie

_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to