Re: [Dspam-user] training time?

2010-04-16 Thread Stevan Bajić
On Fri, 16 Apr 2010 15:41:04 +0800
Michael Alger  wrote:

[...]

> Okay, now I get why you see a big difference between the modes.
> Since I live in a perfect fantasy world where all classification
> errors are corrected, I wasn't seeing the significance. :)
> 
The perfect fantasy world is where we all (the one being responsible for DSPAM. 
aka: the admins) live. But lazy users are unfortunately the reality.


> I was browsing through the README looking to see if dspam had any
> nice hooks for helping to build my own corpus,
>
# dspam_admin change preference ds...@mm.quex.org "makeCorpus" "on"

Or default for everyone:
# dspam_admin change preference default "makeCorpus" "on"

Or if you don't use the preference extension then set it in dspam.conf or in 
individual user prefs files.


> and came across this:
> 
>   tum: Train-until-Mature.  This training mode is a hybrid
>between the other two training modes and provides a great
>balance between volatility and static metadata.
> 
> So apparently I'm not the only one that sees TUM as something of a
> combination between TEFT and TOE. However, the explanation of TUM
> in the README doesn't mention TL as affecting whether it learns or
> not.
>
From the learning method viewpoint it is not a hybrid. TEFT and TUM learn 
without you telling them to learn. TOE does only learn when you tell it to 
learn. That is the reason why I said that TUM should not be compared to TOE. 
But from the way how TUM work as whole it is indeed a hybrid.


> Is this out of date?
> 
No. It is still valid.


> The explanation (abridged from the README version):
> 
>   TuM will train on a per-token basis only tokens which have had
>   fewer than 50 "hits" on them, unless an error is being retrained
>   in which case all tokens are trained.
> 
>   NOTE: You should corpus train before using tum.
> 
> suggests to me that it actually learns a little differently than
> TEFT (and without regard to TL), in that tokens that already have 50
> hits on them will be ignored.
> 
The documentation is not 100% clear in this regard. Only default tokens (BNR 
tokens don't fall into that category. They are another token type and not a 
default token) having ((spam_hits + innocent_hits) < 50) are automatically 
trained by TUM. But for the classification any token is used in TUM.

The code that does the magic in regards to training is this here:
---
if (ds_term->type == 'D' &&
( CTX->training_mode != DST_TUM  || 
  CTX->source == DSS_ERROR   ||
  CTX->source == DSS_INOCULATION ||
  ds_term->s.spam_hits + ds_term->s.innocent_hits < 50 ||
  ds_term->key == diction->whitelist_token ||
  CTX->confidence < 0.70))
{
ds_term->s.status |= TST_DIRTY;
}
---

Translated that means:
if ([current token type] is [default token])
and
 (
   ([training mode] is not [TUM])
  or
   ([current message source] is [ERROR])
  or
   ([current message source] is [INOCULATION])
  or
   ([current token spam hits] + [current token innocent hits] is less than 
[50])
  or
   ([current token key] is [WHITELIST])
  or
   ([current message condifence] is less than [0.70 (aka 70%)])
 )
  then
mark [current token] as [DIRTY]
end if

Marking a token as dirty instructs DSPAM to save back the updated token data to 
the used storage backend.

Let's take an example:
* training mode is TUM
* the message source is not ERROR
* the message source is not INOCULATION
* for simplicity let us assume all default tokens of the message have 20 
innocent hits and 20 spam hits
* for simplicity let us assume the message has no whitelist token
* the whole message has a confidence of 0.80

Then the above condition would result in (for each individual token):
(true) and (false or false or false or true or false or false) -> (true) and 
(true) => true

So each of the tokens would be marked dirty (aka learn the token) because we 
get a TRUE for ((spam_hits + innocent_hits) < 50).

Now using the same values but this time each token has 40 spam hits and 40 
innocent hits AND the whole message has a confidence of 0.65.

Then the above condition would result in (for each individual token):
(true) and (false or false or false or false or false or true) -> (true) and 
(true) => true

As you see the individual tokens would still be trained by TUM because the 
whole message has a confidence less then 0.70. The training is performed 
regardless that each of the individual tokens has a (spam_hits + innocent_hits) 
above 50 (in our example 80).

To sum it up: TUM would train a message (respectively parts of a message) if 
one of the following conditions applies:
* the source is ERROR (aka --source=error)
* the source is INOCULATION (aka --source=inoculation)
* individual tokens have (spam_hits + innocent_hits) < 50
* individual token is a whitelist token
* t

Re: [Dspam-user] training time?

2010-04-16 Thread Michael Alger
On Thu, Apr 15, 2010 at 12:27:47PM +0200, Stevan Bajić wrote:
> On Thu, 15 Apr 2010 17:35:43 +0800
> Michael Alger  wrote:
> 
> [...]
> 
> Learning really only happens if you tell DSPAM that a message
> needs to be reclassified or a message needs to be corpusfed. Or
> when using TEFT (regardless of TL) or TUM (only if TL < 2500).
> 
> But in order to be able to use Bayesian DSPAM needs as well to
> know how many messages it has seen in total. So it is logical that
> it needs to keep track of that by updating the table dspam_stats
> and incrementing "spam_classified" and/or "innocent_classified".

Thanks. That makes sense. Also thanks for the other explanations of
the statistical theory behind it all, which makes things a lot
clearer for me as well.

> > I think saying "TOE is totally different from {NOTRAIN, TEFT,
> > TUM}" is a little strong. It seems to me that TEFT and TOE are
> > quite different, while TUM is a combination of the two: TEFT
> > until it has enough data, and then TOE. Or have I misunderstood?
> 
> Yes. You have missunderstood. TUM and TEFT could possibly learn
> something wrong. While TOE would only learn something when you
> tell it to learn. TUM and TEFT are learning by them self. They
> FIRST learn and then depend on you to FIX errors. TOE does not do
> that. TOE only learns when you want it to learn.

Okay, now I get why you see a big difference between the modes.
Since I live in a perfect fantasy world where all classification
errors are corrected, I wasn't seeing the significance. :)

I was browsing through the README looking to see if dspam had any
nice hooks for helping to build my own corpus, and came across this:

  tum: Train-until-Mature.  This training mode is a hybrid
   between the other two training modes and provides a great
   balance between volatility and static metadata.

So apparently I'm not the only one that sees TUM as something of a
combination between TEFT and TOE. However, the explanation of TUM
in the README doesn't mention TL as affecting whether it learns or
not. Is this out of date?

The explanation (abridged from the README version):

  TuM will train on a per-token basis only tokens which have had
  fewer than 50 "hits" on them, unless an error is being retrained
  in which case all tokens are trained.

  NOTE: You should corpus train before using tum.

suggests to me that it actually learns a little differently than
TEFT (and without regard to TL), in that tokens that already have 50
hits on them will be ignored.

Thanks again for all your explanations.

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] training time?

2010-04-15 Thread Stevan Bajić
On Thu, 15 Apr 2010 17:47:41 +0200
Stevan Bajić  wrote:

> On Thu, 15 Apr 2010 17:35:43 +0800
> Michael Alger  wrote:
> 
> [...]
> > However, I don't understand why simply classifying a message using
> > TOE decrements the Training Left counter. My understanding is that
> > token statistics are only updated when retraining a misclassified
> > message; classifying a message shouldn't cause any changes here, and
> > thus logically shouldn't be construed as "training" the system.
> > 
> > Is this done purely so the statistical sedation is deactivated in
> > TOE mode after 2,500 messages have been processed, or are there
> > other reasons?
> > 
> You have the classical problem understanding statistical thinking. There is a 
> example that you will find in a lot of psychological literature that 
> demonstrates the problem most humans have with statistical thinking. The 
> problem is known in the sociopsychological literature as the "taxi/cab 
> problem". Let me quickly show you the example:
> 
> Two taxi companies are active in a city. The taxis of the company A are 
> green, those of the company B blue. The company A places 15% of the taxis, 
> the company B the remaining 85%. An at night it comes to an accident with hit 
> and run. The fleeing car was a taxi. A witness states that it was a green 
> taxi.
> 
> The court orders to examine the ability of the witnesses to be able to 
> differentiate between green and blue taxies under night view conditions. The 
> test result is: in 80% of the cases the witness was able to identify the 
> correct color and was wrong in the remaining 20% of the cases.
> 
> How high is the probability that the fleeing taxi the witness has seen at 
> that night was a taxi (green) from company A?
> 
> 
> Most people would answer here spontaneous with 80%. In fact a study has shown 
> that a majority of asked persons (among them physicians, judges and studying 
> of elite universities) answer the question with 80%.
> 
> But the correct answer is not 80% :)
> 
> Allow me to explain:
> The whole city has 1'000 taxies. 150 (green) belong to company A and 850 
> (blue) belong to company B. One of those 1'000 taxies is responsible for the 
> accident. The witness says he saw a green taxi and we know that he is correct 
> in 80% of the cases. That means in addition that he calls a blue taxi in 20% 
> of the cases green. From the 850 blue taxis he will thus call (false 
> positive) 170 green. And from the 150 green taxies he will correctly prove 
> (true positive) 120 taxies as green. In order to calculate the probability 
> that he actually saw a green taxi when he identifies a taxi (at night viewing 
> conditions) as green you need to devide all correct answers (TP) of "green" 
> with all answers (FP + TP) of "green". Therefore the probability is: 120 / ( 
> 170 + 120) = 0.41
> 
> The probability that a green taxi caused the accident if the withness means 
> to have seen a green taxi is therefore less then 50%. This probability 
> depends completely crucially on the distribution of the green and blue taxis 
> in the city. Would there be equal amount of green and blue taxies in the city 
> then the correct answer would indeed be 80%.
> 
> Most humans however incline to ignore the initial distribution (also apriori, 
> origin or initial probability). Psychologists speak in this connection of 
> "base rate neglect".
> 
Here a more detailed description from Wikipedia about "base rate neglect":
http://en.wikipedia.org/wiki/Base_rate_fallacy



> And now back to your original statement:
> 
> However, I don't understand why simply classifying a message using
> TOE decrements the Training Left counter. My understanding is that
> token statistics are only updated when retraining a misclassified
> message; classifying a message shouldn't cause any changes here, and
> thus logically shouldn't be construed as "training" the system.
> 
> 
> Without DSPAM keeping track of the TP/TN the whole calculation from above 
> would not be possible. DSPAM would not know that there are 1'000 taxies. It 
> would only know about 30 green taxies and 170 blue taxies. You might now ask 
> yourself why 30 green and why 170 blue? Easy (assuming green = bad/Spam and 
> blue = good/Ham)):
> * 1'000 taxies (processed messages) -> TP + TN
> * 170 taxies identified as green (Spam) but they where blue (Ham) -> FP
> * 30 taxies identified as blue (Ham) but they where green (Spam) -> FN
> 
> Without knowing TP and TN the whole Bayes theorem calculation would not be 
> possible. So DSPAM must keep track of them. It is indeed not a learning thing 
> but for the computation of the probability it is crucial to know that value.
> 
> And since the statistical sedation implemented in DSPAM is watering down the 
> result in order to minimize FP the

Re: [Dspam-user] training time?

2010-04-15 Thread Stevan Bajić
On Thu, 15 Apr 2010 17:35:43 +0800
Michael Alger  wrote:

[...]
> However, I don't understand why simply classifying a message using
> TOE decrements the Training Left counter. My understanding is that
> token statistics are only updated when retraining a misclassified
> message; classifying a message shouldn't cause any changes here, and
> thus logically shouldn't be construed as "training" the system.
> 
> Is this done purely so the statistical sedation is deactivated in
> TOE mode after 2,500 messages have been processed, or are there
> other reasons?
> 
You have the classical problem understanding statistical thinking. There is a 
example that you will find in a lot of psychological literature that 
demonstrates the problem most humans have with statistical thinking. The 
problem is known in the sociopsychological literature as the "taxi/cab 
problem". Let me quickly show you the example:

Two taxi companies are active in a city. The taxis of the company A are green, 
those of the company B blue. The company A places 15% of the taxis, the company 
B the remaining 85%. An at night it comes to an accident with hit and run. The 
fleeing car was a taxi. A witness states that it was a green taxi.

The court orders to examine the ability of the witnesses to be able to 
differentiate between green and blue taxies under night view conditions. The 
test result is: in 80% of the cases the witness was able to identify the 
correct color and was wrong in the remaining 20% of the cases.

How high is the probability that the fleeing taxi the witness has seen at that 
night was a taxi (green) from company A?


Most people would answer here spontaneous with 80%. In fact a study has shown 
that a majority of asked persons (among them physicians, judges and studying of 
elite universities) answer the question with 80%.

But the correct answer is not 80% :)

Allow me to explain:
The whole city has 1'000 taxies. 150 (green) belong to company A and 850 (blue) 
belong to company B. One of those 1'000 taxies is responsible for the accident. 
The witness says he saw a green taxi and we know that he is correct in 80% of 
the cases. That means in addition that he calls a blue taxi in 20% of the cases 
green. From the 850 blue taxis he will thus call (false positive) 170 green. 
And from the 150 green taxies he will correctly prove (true positive) 120 
taxies as green. In order to calculate the probability that he actually saw a 
green taxi when he identifies a taxi (at night viewing conditions) as green you 
need to devide all correct answers (TP) of "green" with all answers (FP + TP) 
of "green". Therefore the probability is: 120 / ( 170 + 120) = 0.41

The probability that a green taxi caused the accident if the withness means to 
have seen a green taxi is therefore less then 50%. This probability depends 
completely crucially on the distribution of the green and blue taxis in the 
city. Would there be equal amount of green and blue taxies in the city then the 
correct answer would indeed be 80%.

Most humans however incline to ignore the initial distribution (also apriori, 
origin or initial probability). Psychologists speak in this connection of "base 
rate neglect".

And now back to your original statement:

However, I don't understand why simply classifying a message using
TOE decrements the Training Left counter. My understanding is that
token statistics are only updated when retraining a misclassified
message; classifying a message shouldn't cause any changes here, and
thus logically shouldn't be construed as "training" the system.


Without DSPAM keeping track of the TP/TN the whole calculation from above would 
not be possible. DSPAM would not know that there are 1'000 taxies. It would 
only know about 30 green taxies and 170 blue taxies. You might now ask yourself 
why 30 green and why 170 blue? Easy (assuming green = bad/Spam and blue = 
good/Ham)):
* 1'000 taxies (processed messages) -> TP + TN
* 170 taxies identified as green (Spam) but they where blue (Ham) -> FP
* 30 taxies identified as blue (Ham) but they where green (Spam) -> FN

Without knowing TP and TN the whole Bayes theorem calculation would not be 
possible. So DSPAM must keep track of them. It is indeed not a learning thing 
but for the computation of the probability it is crucial to know that value.

And since the statistical sedation implemented in DSPAM is watering down the 
result in order to minimize FP the whole Training Left (TL) value was 
introduced in DSPAM to have a way to limit that watering down phase. So the 
more DSPAM has done a positive/negative classification the more mature the 
tokens are considered to be. So after 2'500 TP/TN the statistical sedation gets 
automatically disabled.

I hope you understand now better why we need to update the stati

Re: [Dspam-user] training time?

2010-04-15 Thread Stevan Bajić
On Thu, 15 Apr 2010 17:35:43 +0800
Michael Alger  wrote:

[...]

> Thank you for this explanation and after a quick test I see that the
> TL counter does decrement (and TN increments) when I process mail
> using TOE. If I set it to NOTRAIN, then none of the statistics are
> updated when the messages is processed.
> 
Right.


> However, I don't understand why simply classifying a message using
> TOE decrements the Training Left counter. My understanding is that
> token statistics are only updated when retraining a misclassified
> message; classifying a message shouldn't cause any changes here, and
> thus logically shouldn't be construed as "training" the system.
> 
You are right and wrong. When classifying a message then

1) if using TEFT (regardless of TL) or TUM (only if TL < 2500) then table 
dspam_token_data gets updated and/or new entries are added.

2) if using TOE or NOTRAIN then table dspam_token_data does NOT get any new 
entries.

3) if using TOE then existing entries in table dspam_token_data (aka tokens) 
will get their "last_hit" updated but "spam_hits" nor "innocent_hits" will be 
updated.

4) if using TOE or TEFT or TUM then the table dspam_stats will be updated. But 
only fields "spam_classified" and/or "innocent_classified".


Learning is another issue. When doing learning then the stats get updated 
(fields: "spam_learned", "innocent_learned", "spam_misclassified", 
"innocent_misclassified", "spam_corpusfed", "innocent_corpusfed").


Learning really only happens if you tell DSPAM that a message needs to be 
reclassified or a message needs to be corpusfed. Or when using TEFT (regardless 
of TL) or TUM (only if TL < 2500).

But in order to be able to use Bayesian DSPAM needs as well to know how many 
messages it has seen in total. So it is logical that it needs to keep track of 
that by updating the table dspam_stats and incrementing "spam_classified" 
and/or "innocent_classified".


> Is this done purely so the statistical sedation is deactivated in
> TOE mode after 2,500 messages have been processed, or are there
> other reasons?
> 
Yes. It's only for the statistical sedation.


> Does TUM base its decision to learn purely on the value of the TL
> counter (i.e. stops learning once that reaches 0), or is the TL just
> a hint and TUM actually uses some heuristic based on the number of
> tokens available to it and their scores?
> 
No. TUM is 100% like TEFT until it reaches TL = 0. So TUM and TEFT are FORCING 
A LEARNING on each message they see. TOE is really only learning if you tell it 
to learn (no implicit learning. Only explicit learning).

To sum it up:
* TEFT (regardless of TL) or TUM (only if TL < 2500) are LEARNING EVERY message 
they see.

* TOE is only learning if you TELL IT TO LEARN.

* TEFT (regardless of TL) or TUM (only if TL < 2500) could even LEARN WRONG and 
depend on you to fix their errors. If you have TEFT or TUM (until TL = 0) and 
you DON'T correct errors then the quality of your tokens can decrease (but it 
can increase as well. But only if no classified message was a FP or a FN).



> Is TL used by anything other than the statistical sedation feature?
> 
No.


> I think saying "TOE is totally different from {NOTRAIN, TEFT, TUM}"
> is a little strong. It seems to me that TEFT and TOE are quite
> different, while TUM is a combination of the two: TEFT until it has
> enough data, and then TOE. Or have I misunderstood?
> 
Yes. You have missunderstood. TUM and TEFT could possibly learn something 
wrong. While TOE would only learn something when you tell it to learn. TUM and 
TEFT are learning by them self. They FIRST learn and then depend on you to FIX 
errors. TOE does not do that. TOE only learns when you want it to learn.

Allow me to illustrate something.

Assume you have 1000 tokens in DSPAM. And assume you have a corpus A with 100 
messages and corpus B with 100 messages.

Test case 1)
Now assume you use TEFT/TUM and you check all those mails from corpus A. And 
assume you get 100% accuracy.

Test case 2)
Now assume you use TOE and you check all those mails from corpus A. And assume 
you get as well 100% accuracy.


So far, so god. Now assume you only CLASSIFY corpus B with test case 1 and with 
test case 2. And assume we don't care about the result we got by just 
classifying mails from corpus B.

Now go back and repeat the classification the same way as done above with 
corpus A.

With test case 2 you will get again 100%. For sure!

With test case 1 you have a high chance to NOT get again 100%. The reason for 
that is that TEFT and TUM would have changed "spam_learned" and 
"innocent_learned" while they only CLASSIFIED corpus B. They have learned even 
if you have told it to only classify corpus B.

Do you understand what I mean?



-- 
Kind Regards from Switzerland,

Stevan Bajić

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
pr

Re: [Dspam-user] training time?

2010-04-15 Thread Michael Alger
On Mon, Apr 12, 2010 at 09:18:43PM +0200, Stevan Bajić wrote:
> On Sat, 10 Apr 2010 17:59:25 +0800
> Michael Alger  wrote:
> > On Fri, Apr 09, 2010 at 11:23:16PM -0700, Terry Barnum wrote:
> > >>> I've been running DSPAM for approximately 2 weeks and looking
> > >>> at the output of dspam_stats, I'm curious how long training
> > >>> normally takes.
> > >>>
> > >>> $ cat /usr/local/dspam.conf | grep -v ^# | grep -v ^$
> > >>>
> > >>> TrainingMode toe
> > >>> Preference "trainingMode=TOE"
> > 
> > Your default settings are TOE mode. Are you overriding this for any
> > of the users in their preferences? If not, this would explain why
> > it's only learning from errors: because you told it to.
> > 
> > Try switching this to TUM or TEFT.
> > 
> I think most users here don't understand what training is in the
> context of Anti-Spam. So I am going to try to explain quickly what
> all those different training modes are.

Thank you for this explanation and after a quick test I see that the
TL counter does decrement (and TN increments) when I process mail
using TOE. If I set it to NOTRAIN, then none of the statistics are
updated when the messages is processed.

However, I don't understand why simply classifying a message using
TOE decrements the Training Left counter. My understanding is that
token statistics are only updated when retraining a misclassified
message; classifying a message shouldn't cause any changes here, and
thus logically shouldn't be construed as "training" the system.

Is this done purely so the statistical sedation is deactivated in
TOE mode after 2,500 messages have been processed, or are there
other reasons?

> TUM is exactly like TEFT. He takes the test and after the test he
> as well is buying a book (+/- 100 pages) about the tested topic
> and reading/learning the book. But as soon as he has successfully
> passed 2'500 tests he changes his strategy and stops buying books
> after he has passed a test. He is only buying and reading/learning
> a book if he has failed on a test.

Does TUM base its decision to learn purely on the value of the TL
counter (i.e. stops learning once that reaches 0), or is the TL just
a hint and TUM actually uses some heuristic based on the number of
tokens available to it and their scores?

Is TL used by anything other than the statistical sedation feature?

> TOE is totally different from the above 3. He is taking a test and
> if he is failing to pass the test he goes on and buys a book (+/-
> 100 pages) about the tested topic and reads/learns the book. He
> does that for ever. Every test he takes he is doing the same. If
> he passes the test he does not buy the book and he does not read
> those +/- 100 pages. He just has passed the test and he knows that
> he has passed. So no need for him to invest time in reading 100
> pages for nothing. He is already knowledgeable in that topic he
> tested (remeber: he passed the test).

I think saying "TOE is totally different from {NOTRAIN, TEFT, TUM}"
is a little strong. It seems to me that TEFT and TOE are quite
different, while TUM is a combination of the two: TEFT until it has
enough data, and then TOE. Or have I misunderstood?

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] training time?

2010-04-12 Thread Stevan Bajić
On Sat, 10 Apr 2010 17:59:25 +0800
Michael Alger  wrote:

> On Fri, Apr 09, 2010 at 11:23:16PM -0700, Terry Barnum wrote:
> >>> I've been running DSPAM for approximately 2 weeks and looking
> >>> at the output of dspam_stats, I'm curious how long training
> >>> normally takes.
> >>>
> >>> $ cat /usr/local/dspam.conf | grep -v ^# | grep -v ^$
> >>>
> >>> TrainingMode toe
> >>> Preference "trainingMode=TOE"
> 
> Your default settings are TOE mode. Are you overriding this for any
> of the users in their preferences? If not, this would explain why
> it's only learning from errors: because you told it to.
> 
> Try switching this to TUM or TEFT.
> 
I think most users here don't understand what training is in the context of 
Anti-Spam. So I am going to try to explain quickly what all those different 
training modes are. I will try to avoid this technical/mathematical/statistical 
mabo-jambo and use something else. Sorry if I make to many grammatical errors. 
I have a hard working day behind me and I am just going to type here without 
taking much care about proper English.

The example I will use here is way oversimplified but good enough to explain 
the topic.

Okay. DSPAM has the following training modes:
* NOTRAIN
  => Do not do training

* TEFT
  => Train Everything (some say: Train Every F***ing Time)

* TUM
  => Train Until Mature

* TOE
  => Train On Error

* UNLEARN
  => Unlearn the (previous) training


Now my example:
Let us assume we have a joung human that wants to be a specialist in a specific 
knowledge area/domain. At the beginning that joung human does not know anything 
about the specific area.

Let us assume that that specific area has a lot of material that can be 
learned. That learning material is immense. Infinite. You never stop to learn. 
But let us assume that in general a human is considered to be specialist in 
that area/domain after he/she has passed 2'500 tests.

Now let us assume that each of this training material is a book with +/- 100 
pages. And let us assume that you can take for each topic a test.

Now let us assume we have 4 joung boys trying to become specialists. They are 
called (I know, I know. Stupid names but anyway):
* NOTRAIN
* TEFT
* TUM
* TOE

NOTRAIN is never training. He just relies on what he has learned in the past 
and takes any test without learning before the test and he does not learn after 
the test. He just takes the test and regardless of the result he just continues 
to take the next test.

TEFT on the other hand is taking the test like NOTRAIN but each time after he 
has taken the test he is buying a book (+/- 100 pages) about the tested topic 
and reads/learns the book. And he continues this for each and every test. He 
does not stop after he has successfully passed 2'500 topic tests. He takes test 
2'500 and 2'501 and 2'502 and and and. He never ever stops to learn (FORCED 
LEARNING).

TUM is exactly like TEFT. He takes the test and after the test he as well is 
buying a book (+/- 100 pages) about the tested topic and reading/learning the 
book. But as soon as he has successfully passed 2'500 tests he changes his 
strategy and stops buying books after he has passed a test. He is only buying 
and reading/learning a book if he has failed on a test.

TOE is totally different from the above 3. He is taking a test and if he is 
failing to pass the test he goes on and buys a book (+/- 100 pages) about the 
tested topic and reads/learns the book. He does that for ever. Every test he 
takes he is doing the same. If he passes the test he does not buy the book and 
he does not read those +/- 100 pages. He just has passed the test and he knows 
that he has passed. So no need for him to invest time in reading 100 pages for 
nothing. He is already knowledgeable in that topic he tested (remeber: he 
passed the test).


So now allow me to glue together DSPAM with the above example. In DSPAM world 
those 2'500 tests would be TL (Training Left). And in DSPAM world each of the 
trainee from above (except NOTRAIN and obviously UNTRAIN) would take extra care 
while they have not passed at least 2'500 tests. The extra care is that in 
DSPAM you have the option called "statisticalSedation". This is a parameter 
that allows DSPAM to water down the catch rate (catch of Spam). This parameter 
exists for those out there that are absolutely paranoid about FPs (false 
positives). I could now go on and explain the mathematical/statistical reason 
behind that parameter but I save my self some time not explaining it. For now 
just accept that the parameter is there and that it allows you to tune how 
aggressive DSPAM will try to catch Spam while it has not at least processed 
2'500 innocent messages.

Okay. I think that now most of you should +/- understand what those training 
modes are and how they work in DSPAM. And each of those modes has a reason to 
be there. A lot of you might now think that some of those modes are useless and 
others are more useful. Right. All of them h

Re: [Dspam-user] training time?

2010-04-11 Thread Stevan Bajić
On Sat, 10 Apr 2010 11:33:15 -0700
Terry Barnum  wrote:

> 
[...]
> 
> That's what I'm wondering too. Could the train.dspam script somehow trigger a 
> reset of those fields?
> 
As with everything in life: Everything is possible.
But quickly looking over the script I don't see anything that would explain a 
reset.


> It's very possible I have a stupid mis-configuration problem and I very much 
> appreciate the help. This is my first postfix/dovecot install and I'm 
> learning something every day.
> 
That is possible too. Could you directly send me your main.cf and your 
master.cf? And your dovecot.conf? What are you using to manage users? Any 
specific tool? What tool?


> 
[...]
> 
> $ cat dspam_filter_access
> /./   FILTER dspam:dspam
> 
Okay. I see.


> 
[...]
> 
> Yes. Is this not a good approach?
> 
It is not something that one would say it's a bad approach. I however had 
issues in the past when using FILTER. Especially when piping to DSPAM (or any 
other application) and using mails that have non latin characters. Then the 
FILTER in conjuction with pipe breaks very often.


> Also, I'm not sure if this helps the diagnosis, but here's dspam_admin list 
> preference default output that shows the change you suggested to force 
> signatureLocation into the header.
> 
> $ sudo dspam_admin list preference default
> signatureLocation=headers
> 
> Thanks,
> -Terry
> 
-- 
Kind Regards from Switzerland,

Stevan Bajić

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] training time?

2010-04-10 Thread Terry Barnum

On Apr 10, 2010, at 3:27 AM, Stevan Bajić wrote:

> On Fri, 9 Apr 2010 23:23:16 -0700
> Terry Barnum  wrote:
> 
>> 
>> On Apr 9, 2010, at 7:21 PM, Stevan Bajić wrote:
>> 
>>> On Fri, 9 Apr 2010 19:00:54 -0700
>>> Terry Barnum  wrote:
>>> 
 I've been running DSPAM for approximately 2 weeks and looking at the 
 output of dspam_stats, I'm curious how long training normally takes.
 
 A script is run nightly to check .Junk mailboxes for false negatives and 
 .NotJunk mailboxes for false positives and retrains on error. (Richard 
 Valk's http://switch.richard5.net/serverinstall/train.dspam)
 
 Here's sample output from dspam_stats -H
 
 x...@dop.com:
TP True Positives: 0
TN True Negatives:19
FP False Positives:0
FN False Negatives:  348
SC Spam Corpusfed: 0
NC Nonspam Corpusfed:  0
TL Training Left:   2481
SHR Spam Hit Rate  0.00%
HSR Ham Strike Rate:   0.00%
PPV Positive predictive value:   100.00%
OCA Overall Accuracy:  5.18%
 
 y...@dop.com:
TP True Positives: 0
TN True Negatives: 0
FP False Positives:0
FN False Negatives: 3035
SC Spam Corpusfed: 0
NC Nonspam Corpusfed:  0
TL Training Left:   2500
SHR Spam Hit Rate  0.00%
HSR Ham Strike Rate: 100.00%
PPV Positive predictive value:   100.00%
OCA Overall Accuracy:  0.00%
 
 z...@dop.com:
TP True Positives: 0
TN True Negatives: 0
FP False Positives:0
FN False Negatives:  358
SC Spam Corpusfed: 0
NC Nonspam Corpusfed:  0
TL Training Left:   2500
SHR Spam Hit Rate  0.00%
HSR Ham Strike Rate: 100.00%
PPV Positive predictive value:   100.00%
OCA Overall Accuracy:  0.00%
 
 te...@dop.com:
TP True Positives: 0
TN True Negatives: 3
FP False Positives:0
FN False Negatives: 5108
SC Spam Corpusfed: 0
NC Nonspam Corpusfed:  0
TL Training Left:   2497
SHR Spam Hit Rate  0.00%
HSR Ham Strike Rate:   0.00%
PPV Positive predictive value:   100.00%
OCA Overall Accuracy:  0.09%
 
>>> This all looks to me that you are not using DSPAM at all. Seems to me that 
>>> only the script from http://switch.richard5.net/serverinstall/train.dspam 
>>> is feeding DSPAM with data in your setup.
>> 
>> Thank you for your help Stevan. My understanding of how this is supposed to 
>> eventually work is DSPAM analyzes and adds a header to email as Innocent or 
>> Spam and the MUA, which is configured to trust the Spam header, moves mail 
>> into the Junk mailbox if DSPAM classified it as Spam. The MUA has its own 
>> Junk filtering and moves mail it considers spam into the Junk mailbox too. 
>> So the nightly script may run across mail in the Junk mailbox that it 
>> mis-classified as Innocent but is actually spam and is retrained as a false 
>> negative. Conversely, if DSPAM incorrectly classifies mail as spam, the user 
>> moves that email from the Junk mailbox into the NotJunk mailbox so the 
>> nightly script can retrain as a false positive.
>> 
> So what it does is basically what the Dovecot anti-spam plugin does. The 
> plugin however does it in real time while the script you have there does it 
> on a scheduled basis.
> 
> 
>> DSPAM appears to be correctly adding headers but so far I've seen only 
>> Whitelisted and Innocent.
>> 
> But how is it possible that you almost have everywhere 0 for TN/TP. If DSPAM 
> would work properly then TP/TN would need to increase every time you get a 
> mail.

That's what I'm wondering too. Could the train.dspam script somehow trigger a 
reset of those fields?

It's very possible I have a stupid mis-configuration problem and I very much 
appreciate the help. This is my first postfix/dovecot install and I'm learning 
something every day.


>

Re: [Dspam-user] training time?

2010-04-10 Thread Stevan Bajić
On Sat, 10 Apr 2010 17:59:25 +0800
Michael Alger  wrote:

> On Fri, Apr 09, 2010 at 11:23:16PM -0700, Terry Barnum wrote:
> >>> I've been running DSPAM for approximately 2 weeks and looking
> >>> at the output of dspam_stats, I'm curious how long training
> >>> normally takes.
> >>>
> >>> $ cat /usr/local/dspam.conf | grep -v ^# | grep -v ^$
> >>>
> >>> TrainingMode toe
> >>> Preference "trainingMode=TOE"
> 
> Your default settings are TOE mode. Are you overriding this for any
> of the users in their preferences? If not, this would explain why
> it's only learning from errors: because you told it to.
> 
> Try switching this to TUM or TEFT.
> 
I would advise AGAINST going to TEFT. The problem he is describing has not much 
to do with the training mode. Even in TOE the TP/TN counters should increase 
each time he gets a new mail. So something is fishy in his setup. Those TP/TN 
numbers should increase with each inbound mail regardless of the training mode.

-- 
Kind Regards from Switzerland,

Stevan Bajić

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] training time?

2010-04-10 Thread Stevan Bajić
On Fri, 9 Apr 2010 23:23:16 -0700
Terry Barnum  wrote:

> 
> On Apr 9, 2010, at 7:21 PM, Stevan Bajić wrote:
> 
> > On Fri, 9 Apr 2010 19:00:54 -0700
> > Terry Barnum  wrote:
> > 
> >> I've been running DSPAM for approximately 2 weeks and looking at the 
> >> output of dspam_stats, I'm curious how long training normally takes.
> >> 
> >> A script is run nightly to check .Junk mailboxes for false negatives and 
> >> .NotJunk mailboxes for false positives and retrains on error. (Richard 
> >> Valk's http://switch.richard5.net/serverinstall/train.dspam)
> >> 
> >> Here's sample output from dspam_stats -H
> >> 
> >> x...@dop.com:
> >>TP True Positives: 0
> >>TN True Negatives:19
> >>FP False Positives:0
> >>FN False Negatives:  348
> >>SC Spam Corpusfed: 0
> >>NC Nonspam Corpusfed:  0
> >>TL Training Left:   2481
> >>SHR Spam Hit Rate  0.00%
> >>HSR Ham Strike Rate:   0.00%
> >>PPV Positive predictive value:   100.00%
> >>OCA Overall Accuracy:  5.18%
> >> 
> >> y...@dop.com:
> >>TP True Positives: 0
> >>TN True Negatives: 0
> >>FP False Positives:0
> >>FN False Negatives: 3035
> >>SC Spam Corpusfed: 0
> >>NC Nonspam Corpusfed:  0
> >>TL Training Left:   2500
> >>SHR Spam Hit Rate  0.00%
> >>HSR Ham Strike Rate: 100.00%
> >>PPV Positive predictive value:   100.00%
> >>OCA Overall Accuracy:  0.00%
> >> 
> >> z...@dop.com:
> >>TP True Positives: 0
> >>TN True Negatives: 0
> >>FP False Positives:0
> >>FN False Negatives:  358
> >>SC Spam Corpusfed: 0
> >>NC Nonspam Corpusfed:  0
> >>TL Training Left:   2500
> >>SHR Spam Hit Rate  0.00%
> >>HSR Ham Strike Rate: 100.00%
> >>PPV Positive predictive value:   100.00%
> >>OCA Overall Accuracy:  0.00%
> >> 
> >> te...@dop.com:
> >>TP True Positives: 0
> >>TN True Negatives: 3
> >>FP False Positives:0
> >>FN False Negatives: 5108
> >>SC Spam Corpusfed: 0
> >>NC Nonspam Corpusfed:  0
> >>TL Training Left:   2497
> >>SHR Spam Hit Rate  0.00%
> >>HSR Ham Strike Rate:   0.00%
> >>PPV Positive predictive value:   100.00%
> >>OCA Overall Accuracy:  0.09%
> >> 
> > This all looks to me that you are not using DSPAM at all. Seems to me that 
> > only the script from http://switch.richard5.net/serverinstall/train.dspam 
> > is feeding DSPAM with data in your setup.
> 
> Thank you for your help Stevan. My understanding of how this is supposed to 
> eventually work is DSPAM analyzes and adds a header to email as Innocent or 
> Spam and the MUA, which is configured to trust the Spam header, moves mail 
> into the Junk mailbox if DSPAM classified it as Spam. The MUA has its own 
> Junk filtering and moves mail it considers spam into the Junk mailbox too. So 
> the nightly script may run across mail in the Junk mailbox that it 
> mis-classified as Innocent but is actually spam and is retrained as a false 
> negative. Conversely, if DSPAM incorrectly classifies mail as spam, the user 
> moves that email from the Junk mailbox into the NotJunk mailbox so the 
> nightly script can retrain as a false positive.
> 
So what it does is basically what the Dovecot anti-spam plugin does. The plugin 
however does it in real time while the script you have there does it on a 
scheduled basis.


> DSPAM appears to be correctly adding headers but so far I've seen only 
> Whitelisted and Innocent.
> 
But how is it possible that you almost have everywhere 0 for TN/TP. If DSPAM 
would work properly then TP/TN would need to increase every time you get a mail.


> >> Is so much "Training Left" normal? Do I have something misconfigured? Will 
> >> DSPAM start tagging email as SPAM only after 2500 successfully classified 
> >> emails?
> >> 
> > No. DSPAM is fully functional from day one. The tagging can be turned 
> > on/off inside dspam.conf or with the preference extension. However... 
> > turning on/off the tagging has nothing to 

Re: [Dspam-user] training time?

2010-04-10 Thread Michael Alger
On Fri, Apr 09, 2010 at 11:23:16PM -0700, Terry Barnum wrote:
>>> I've been running DSPAM for approximately 2 weeks and looking
>>> at the output of dspam_stats, I'm curious how long training
>>> normally takes.
>>>
>>> $ cat /usr/local/dspam.conf | grep -v ^# | grep -v ^$
>>>
>>> TrainingMode toe
>>> Preference "trainingMode=TOE"

Your default settings are TOE mode. Are you overriding this for any
of the users in their preferences? If not, this would explain why
it's only learning from errors: because you told it to.

Try switching this to TUM or TEFT.

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


Re: [Dspam-user] training time?

2010-04-09 Thread Terry Barnum

On Apr 9, 2010, at 7:21 PM, Stevan Bajić wrote:

> On Fri, 9 Apr 2010 19:00:54 -0700
> Terry Barnum  wrote:
> 
>> I've been running DSPAM for approximately 2 weeks and looking at the output 
>> of dspam_stats, I'm curious how long training normally takes.
>> 
>> A script is run nightly to check .Junk mailboxes for false negatives and 
>> .NotJunk mailboxes for false positives and retrains on error. (Richard 
>> Valk's http://switch.richard5.net/serverinstall/train.dspam)
>> 
>> Here's sample output from dspam_stats -H
>> 
>> x...@dop.com:
>>  TP True Positives: 0
>>  TN True Negatives:19
>>  FP False Positives:0
>>  FN False Negatives:  348
>>  SC Spam Corpusfed: 0
>>  NC Nonspam Corpusfed:  0
>>  TL Training Left:   2481
>>  SHR Spam Hit Rate  0.00%
>>  HSR Ham Strike Rate:   0.00%
>>  PPV Positive predictive value:   100.00%
>>  OCA Overall Accuracy:  5.18%
>> 
>> y...@dop.com:
>>  TP True Positives: 0
>>  TN True Negatives: 0
>>  FP False Positives:0
>>  FN False Negatives: 3035
>>  SC Spam Corpusfed: 0
>>  NC Nonspam Corpusfed:  0
>>  TL Training Left:   2500
>>  SHR Spam Hit Rate  0.00%
>>  HSR Ham Strike Rate: 100.00%
>>  PPV Positive predictive value:   100.00%
>>  OCA Overall Accuracy:  0.00%
>> 
>> z...@dop.com:
>>  TP True Positives: 0
>>  TN True Negatives: 0
>>  FP False Positives:0
>>  FN False Negatives:  358
>>  SC Spam Corpusfed: 0
>>  NC Nonspam Corpusfed:  0
>>  TL Training Left:   2500
>>  SHR Spam Hit Rate  0.00%
>>  HSR Ham Strike Rate: 100.00%
>>  PPV Positive predictive value:   100.00%
>>  OCA Overall Accuracy:  0.00%
>> 
>> te...@dop.com:
>>  TP True Positives: 0
>>  TN True Negatives: 3
>>  FP False Positives:0
>>  FN False Negatives: 5108
>>  SC Spam Corpusfed: 0
>>  NC Nonspam Corpusfed:  0
>>  TL Training Left:   2497
>>  SHR Spam Hit Rate  0.00%
>>  HSR Ham Strike Rate:   0.00%
>>  PPV Positive predictive value:   100.00%
>>  OCA Overall Accuracy:  0.09%
>> 
> This all looks to me that you are not using DSPAM at all. Seems to me that 
> only the script from http://switch.richard5.net/serverinstall/train.dspam is 
> feeding DSPAM with data in your setup.

Thank you for your help Stevan. My understanding of how this is supposed to 
eventually work is DSPAM analyzes and adds a header to email as Innocent or 
Spam and the MUA, which is configured to trust the Spam header, moves mail into 
the Junk mailbox if DSPAM classified it as Spam. The MUA has its own Junk 
filtering and moves mail it considers spam into the Junk mailbox too. So the 
nightly script may run across mail in the Junk mailbox that it mis-classified 
as Innocent but is actually spam and is retrained as a false negative. 
Conversely, if DSPAM incorrectly classifies mail as spam, the user moves that 
email from the Junk mailbox into the NotJunk mailbox so the nightly script can 
retrain as a false positive.

DSPAM appears to be correctly adding headers but so far I've seen only 
Whitelisted and Innocent.


>> Is so much "Training Left" normal? Do I have something misconfigured? Will 
>> DSPAM start tagging email as SPAM only after 2500 successfully classified 
>> emails?
>> 
> No. DSPAM is fully functional from day one. The tagging can be turned on/off 
> inside dspam.conf or with the preference extension. However... turning on/off 
> the tagging has nothing to do with the training left number.
> 
> 
>> $ dspam --version
>> 
>> DSPAM Anti-Spam Suite 3.9.0 (agent/library)
>> 
>> Copyright (c) 2002-2009 DSPAM Project
>> http://dspam.sourceforge.net.
>> 
>> DSPAM may be copied only under the terms of the GNU General Public License,
>> a copy of which can be found with the DSPAM distribution kit.
>> 
>> $ cat /usr/local/dspam.conf | grep -v ^# | grep -v ^$
>> 
>> Home /usr/local/var/dspam
>> StorageDriver /usr/local/lib/dspam/libmysql_drv.dyli

Re: [Dspam-user] training time?

2010-04-09 Thread Stevan Bajić
On Fri, 9 Apr 2010 19:00:54 -0700
Terry Barnum  wrote:

> I've been running DSPAM for approximately 2 weeks and looking at the output 
> of dspam_stats, I'm curious how long training normally takes.
> 
> A script is run nightly to check .Junk mailboxes for false negatives and 
> .NotJunk mailboxes for false positives and retrains on error. (Richard Valk's 
> http://switch.richard5.net/serverinstall/train.dspam)
> 
> Here's sample output from dspam_stats -H
> 
> x...@dop.com:
>   TP True Positives: 0
>   TN True Negatives:19
>   FP False Positives:0
>   FN False Negatives:  348
>   SC Spam Corpusfed: 0
>   NC Nonspam Corpusfed:  0
>   TL Training Left:   2481
>   SHR Spam Hit Rate  0.00%
>   HSR Ham Strike Rate:   0.00%
>   PPV Positive predictive value:   100.00%
>   OCA Overall Accuracy:  5.18%
> 
> y...@dop.com:
>   TP True Positives: 0
>   TN True Negatives: 0
>   FP False Positives:0
>   FN False Negatives: 3035
>   SC Spam Corpusfed: 0
>   NC Nonspam Corpusfed:  0
>   TL Training Left:   2500
>   SHR Spam Hit Rate  0.00%
>   HSR Ham Strike Rate: 100.00%
>   PPV Positive predictive value:   100.00%
>   OCA Overall Accuracy:  0.00%
> 
> z...@dop.com:
>   TP True Positives: 0
>   TN True Negatives: 0
>   FP False Positives:0
>   FN False Negatives:  358
>   SC Spam Corpusfed: 0
>   NC Nonspam Corpusfed:  0
>   TL Training Left:   2500
>   SHR Spam Hit Rate  0.00%
>   HSR Ham Strike Rate: 100.00%
>   PPV Positive predictive value:   100.00%
>   OCA Overall Accuracy:  0.00%
> 
> te...@dop.com:
>   TP True Positives: 0
>   TN True Negatives: 3
>   FP False Positives:0
>   FN False Negatives: 5108
>   SC Spam Corpusfed: 0
>   NC Nonspam Corpusfed:  0
>   TL Training Left:   2497
>   SHR Spam Hit Rate  0.00%
>   HSR Ham Strike Rate:   0.00%
>   PPV Positive predictive value:   100.00%
>   OCA Overall Accuracy:  0.09%
> 
This all looks to me that you are not using DSPAM at all. Seems to me that only 
the script from http://switch.richard5.net/serverinstall/train.dspam is feeding 
DSPAM with data in your setup.


> Is so much "Training Left" normal? Do I have something misconfigured? Will 
> DSPAM start tagging email as SPAM only after 2500 successfully classified 
> emails?
> 
No. DSPAM is fully functional from day one. The tagging can be turned on/off 
inside dspam.conf or with the preference extension. However... turning on/off 
the tagging has nothing to do with the training left number.


> $ dspam --version
> 
> DSPAM Anti-Spam Suite 3.9.0 (agent/library)
> 
> Copyright (c) 2002-2009 DSPAM Project
> http://dspam.sourceforge.net.
> 
> DSPAM may be copied only under the terms of the GNU General Public License,
> a copy of which can be found with the DSPAM distribution kit.
> 
> $ cat /usr/local/dspam.conf | grep -v ^# | grep -v ^$
> 
> Home /usr/local/var/dspam
> StorageDriver /usr/local/lib/dspam/libmysql_drv.dylib
> TrustedDeliveryAgent "/usr/bin/procmail"
> DeliveryHost  127.0.0.1
> DeliveryPort  10026
> DeliveryIdent localhost
> DeliveryProto SMTP
> OnFail error
> Trust root
> Trust dspam
> Trust apache
> Trust mail
> Trust mailnull 
> Trust smmsp
> Trust daemon
> Trust _dspam
> Trust _postfix
> Trust _www
> TrainingMode toe
> TestConditionalTraining on
> Feature whitelist
> Algorithm graham burton
> Tokenizer osb
> PValue bcr
> WebStats on
> Preference "trainingMode=TOE" # { TOE | TUM | TEFT | NOTRAIN } -> 
> default:teft
> Preference "spamAction=tag"   # { quarantine | tag | deliver } -> 
> default:quarantine
> Preference "spamSubject=[SPAM]"   # { string } -> default:[SPAM]
> Preference "statisticalSedation=5"# { 0 - 10 } -> default:0
> Preference "enableBNR=on" # { on | off } -> default:off
> Preference "enableWhitelist=on"   # { on | o