[EMAIL PROTECTED] wrote:
    Eric> My current training process is everything above 0.2 is considered
    Eric> good and delivered.  Everything below 0.8 is considered bad and
Eric> dumpstered.
Kind of a large overlap there.  Do you mean everything *below* 0.2 is good
and everything *above* 0.8 is spam?

whoops. above as in greater than. that is what I get for writing while distracted. the real code logic is:

on first pass, messages scoring < 0.2 are delivered. messages scoring >= 0.8 are dumpstered. the rest are presented to the user.

after the user determines which message is good or bad, train as follows.

if message is considered good (scoring >= 0.2), train as good

if message is considered bad (scoring < 0.8), train as bad.

for example:

if score >= 0.2:
  log( "train it green %s"% (score,),1)
  # retrain as green because the user said so.
  lock_file = file(configuration_data["sb_lock"], "w+")
  locker = simple_locker.locker()
  locker.lock(lock_file, simple_locker.LOCK_EX)

bayesian_storage = storage.open_storage( configuration_data['sb_features'], "dbm" )
  filter_x = hammie.Hammie(bayesian_storage)
  filter_x.train(tpblue_message.message,False)
  # unlock and close access to lock file
  locker.unlock(lock_file)
  lock_file.close()

  log("TAG  learned %s" % result,1)
  # update yellow limits
  # yellow_limit.do_injection(configuration_data['user_ID'],score)

  # save reputation
  hanky = common_services.reputation_DBM()
  hanky.add_good_reputation(tpblue_message.meta['xforward'], "10")
  hanky = None

  # change state to 'good'
  tpblue_message.alter_state('delivered')


I think I see an unrelated bug. I release the lock before flushing the spambayes data file to disk.

_______________________________________________
SpamBayes@python.org
http://mail.python.org/mailman/listinfo/spambayes
Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to