Some time around 08/07/2004 18:21:10, I think I heard Peter Kerekes say:
!SNIP!
I haven't much clue what the lines in advance.ini do and therefore do not
want to experiment with it.
Can anyone suggest a change in any settings to improve filtration?
Advance.ini file:
working thread priority=2
onexit thread priority=3
selective download spam threshold=10
export selective download=1
simple digits spam marks=1
no spaces spam marks=1
limit size to hash=19
limit size to hash header=96
temporary dictionary=c:\\temp
use expiration=0
age to expirate=100
learn from zero=1
max size of log file=131072
recalculating strategy=3
regarding threshold=1.5
use autotrain=1
use degeneration=1
number of exclamations=5
!SNIP!
Hello:
I've been looking for a while on help tuning my BayesIt installation, but I
can't seem to find much help, even though I search the archives of this list and the
web. I used to use PopFile and became pretty proficient at tuning it, even editing
the corpus by hand, but even though I find BayesIt much more competent (and accurate),
I don't seem to understand much of its advanced features.
I've read from many that the new Advanced.ini file contains comments from the
developer explaining the various options, but mine (v5.5) does not. The only help
available from the BayesIt site is outdated and refers to an updated version in the
RitLabs page, but its in Russian, which I cannot read. So, with the help of the fish
(the Babelfish, that is), an online Russian-English dictionary, and a bit of deductive
reasoning, I was able to translate it as best as I could. It helped me a bit, so I
thought it might help others too.
Still, some explanations are a bit too technical, and they could use some
finessing, so if anybody can help further, I (and others, I'm sure) will appreciate it
inmensely. Technical or not, its still more understandable to us non-russian speaking
people.
;working thread priority (2)
;Determines the priority of the base retraining process.
;Retraining is carried out by the filter in the background mode
;and it is usually imperceptible to the user. By default, the
;value of this parameter (2) corresponds to the system parameter
;THREAD_PRIORITY_LOWEST.
;onexit thread priority (3)
;If, during the retraining process, the user clicked on the exit
;button in The Bat!, the retraining process will acquire the
;indicated priority. Usually, it is higher than normal. This is
;necessary so that the filter notifies the current retraining
;operation as soon as possible when it is safe to interrupt the
;process without risk of losing important data. By default, the
;value of this parameter (3) corresponds to the system parameter
;TRHEAD_PRIORITY_NORMAL.
;export selective download (1)
;When defined, the filter will export the collection of trigger
;lines for the selective download filter. If the parameter is set
;to 1, then the filter will create the file selective.txt in
;the working folder, which will contain the constantly updated
;list of regular expressions encountered in the headers of
;spam-messages. If this parameter is set to 0, then no lists of
;lines will be exported.
;selective download spam threshold (10)
;Determines with what frequency any one token must appear in the
;headers of spam messages in order for it to be included in the
;file selective.txt (see the previous parameter). It is
;recommended that this number is computed so that the size of the
;file selective.txt would not exceed 40Kb-50Kb. With larger lists
;of trigger lines, The Bat! becomes unstable. Words are selected
;into the file selective.txt based on the following criteria: the
;word must exist in the headers of the message and must never be
;encountered in the headers of non-spam messages, and has been
;encountered n number of times in the headers of spam messages;
;where n corresponds to the discussed parameter.
;simple digits spam marks (1)
;Allows html-comments in the messages of the form !--2345--
;(i.e. consisting of some numbers) to be treated as special
;generalized technical tokens. Since such headers are encountered
;in essence in spam messages, this special token can
;substantially help during the analysis of some messages.
;no spaces spam marks (1)
;Is analogous to the previous parameter; however, it treats as
;special tokens not only numerical comments, but any comment
;which does not contain whitespace.
;limit size to hash (19)
;Allows you to assign a maximum length to the words which will be
;stored in the base unchanged. If any word exceeds the assigned
;length (for example, a pgp-signature), then it will be
;automatically encoded into a hash and saved in the base in its
;original form.
;limit size to hash header (96)
;Assigns a similar length to the tokens from