Bugs item #797064, was opened at 2003-08-29 13:08 Message generated for change (Settings changed) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=797064&group_id=61702
Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Meyer (anadelonbrin) >Assigned to: Tony Meyer (anadelonbrin) Summary: Problems moving messages between pst and Exchange Initial Comment: If a message's clues are viewed when on the Exchange server, and compared to the same message moved to a pst file, the clues are not the same. It appears (I haven't examined closely yet; can do on request) that on Exchange the html part of the message is used, and in the pst, it isnt'. Probably related to this is the problem that moving a message back and forwards between Exchange and a pst file (showing clues each time) results in an ever- increasing number of tokens. It doesn't appear to be the PR_SEARCH_KEY changing: >>> key1 = "PR_SEARCH_KEY : '\n\x02\xde\xfd7\xf6 \xa7A\x93\xfd\xf3\xb1\xfeA\x16\xf9'" >>> key2 = "PR_SEARCH_KEY : '\n\x02\xde\xfd7\xf6 \xa7A\x93\xfd\xf3\xb1\xfeA\x16\xf9'" >>> key1 == key2 True Next thing to try? :) ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-09-06 15:09 Message: Logged In: YES user_id=552329 Are we going to be able to get identical token streams? Attached are two 'show clues' messages, for the same message, on a pst and on Exchange. 26 clues for one, and 28 for the other. This is a plain text message. The extra two clues arise because Exchange html'ises the plain text message and so the words in the subject also appear in the body. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-08-31 17:34 Message: Logged In: YES user_id=14198 The underlying bug seems to be https://sourceforge.net/tracker/index.php?func=detail&aid=798029&group_id=61702&atid=498103 - however, as it looks like we will be almost "hand-crafting" the HTML of the message, I will leave this open, as we may still end up with bugs if the html we generate isn't identical (token-wise) to the MS one. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-08-30 21:32 Message: Logged In: YES user_id=552329 The dump_props are attached. If I just move the messages about, doing 'show clues', then no training takes place. I think my original comment was wrong - trying now, I get the same number of tokens no matter how many times I move (although the exchange count and pst count are different). Anyway, the log (at verbose=1) doesn't show anything apart from the "already trained as ham" message. If I train a message I get not that much more. pst first: """ Training on message 'Re: comparing 2 images' - trained as spam Saving bayes database with 4637 spam and 410 good messages -> C:\Documents and Settings\tameyer.MASSEY\Application Data\SpamBayes\default_bayes_database.db -> C:\Documents and Settings\tameyer.MASSEY\Application Data\SpamBayes\default_message_database.db Saved databases in 896.138ms """ and moving it back to Exchange: """ Training on message 'Re: comparing 2 images' - trained as good Saving bayes database with 4636 spam and 411 good messages -> C:\Documents and Settings\tameyer.MASSEY\Application Data\SpamBayes\default_bayes_database.db -> C:\Documents and Settings\tameyer.MASSEY\Application Data\SpamBayes\default_message_database.db Saved databases in 850.026ms """ Does this help? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=797064&group_id=61702 _______________________________________________ Spambayes-bugs mailing list [email protected] http://mail.python.org/mailman/listinfo/spambayes-bugs
