[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=5675 Henrik Krohns changed: What|Removed |Added CC||apa...@hege.li Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #24 from Henrik Krohns --- Closing old stale bug. All matters from this seem resolved. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 Darxus dar...@chaosreigns.com changed: What|Removed |Added CC||mkettler...@verizon.net --- Comment #23 from Darxus dar...@chaosreigns.com --- *** Bug 5745 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-12-07 06:18 --- Created bugs: 5742 (documentation clarity on ok_locales) 5743 (feature request for adding not_ok_locales) 5744 (textcat documentation clarity) 5745 (feature request for adding not_ok_languages to textcat) I *think* that's all the valid bits of this bug. The rest are either config errors, or invalid. (ie: in comment 7 the request for a reference to textcat in ok_locales isn't invalid.. ok_locales doesn't have anything to do with textcat. Adding the reference would only further the confusion. If anything we should add a note indicating the two are not related.) --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-12-07 11:03 --- The above applies to ok_languages too. Sorry I did not properly comment the correct bug report, but as the representative of the simple user in the street, I cannot get too detailed. :-) --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-12-07 21:45 --- But they're definitively *NOT* white_ and black_. OK. Perhaps add them in case users are looking for them or their functionality. They could exist in parallel with the present commands. The black one could even be an alias for the present one... Anyway, users perhaps think in terms of simple operators: black, white, and that's what they look for if they don't have time to hunker down with the man page. Of course languages and locals would be a blur to those users too. Anyway, thanks from the glue sniffer crowd for making this all a little simpler. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-12-07 18:04 --- Re-mentioning all these with the bug prefix so they get hyperlinked: bug 5742: (documentation clarity on ok_locales) bug 5743: (feature request for adding not_ok_locales) bug 5744: (textcat documentation clarity) bug 5745: (feature request for adding not_ok_languages to textcat) Hope that works --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-12-07 18:01 --- But they're definitively *NOT* white_ and black_. There's no white about it. No configuration of either feature results in negative points being applied, or anything else that would inhibit the message from being tagged as spam by other rules. If anything, it's black_ and notblack_. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-12-07 10:59 --- Perhaps start afresh with white_ black_, or good_ bad_, as ok and not_ok look nonsymmetrical. Of course this would be a simple whitelist/blacklist model. You could leave ok_locales intact for compatibility independent of the new additional simple white/black lists. (But white/black infer 100 points to some users perhaps.) Anyway, the user should be able to find the simple white and black lists directives he is looking for. Leave the fancy 'email SPF style syntax' stuff or whatever proposals for the third expert ok_locales directive. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-22 04:41 --- You mean like the note in README file? From the Customising SpamAssassin section: - /etc/mail/spamassassin/*.pre: Plugin control files, installed from the distribution. These are used to control what plugins are loaded. Modifications here will be loaded before any configuration loaded from the above directories. You want to modify these files if you want to load additional plugins, or inhibit loading a plugin that is enabled by default. If the files exist in /etc/mail/spamassassin, they will not be overwritten during future installs. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-19 16:08 --- OK, I have now learned from the mailing list about .pre files. Still, on Mail::SpamAssassin::Conf, where it says See Mail::SpamAssassin::Plugin for more details on writing plugins, please add an additional reference for more details on USING plugins. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-17 03:07 --- (In reply to comment #13) However, I'm also going to suggest this bug be closed WONTFIX. Currently it is too cluttered with too many different issues to be usable for development purposes, and half aren't really bugs. I'd rather see this moved to a users list email discussion, and then separate bugs created for each of the problems that aren't configuration errors. Any devs have a preference? +1 agreed. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-15 23:05 --- Yes please document it better as one would think it means Japanese? No. Chinese? No. Russian? No. Polish? No. Ukrainian? No. That's five, giving up. Don't put loadplugin statements into your .cf files .cf files? I put it in user_prefs! -- my best guess as to how to use this jazz. SpamAssassin version 3.2.1 running on Perl version 5.8.8 ... Debian sid. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-16 21:22 --- (In reply to comment #12) Yes please document it better as one would think it means Japanese? No. Chinese? No. Russian? No. Polish? No. Ukrainian? No. That's five, giving up. Ok, I suggest rewriting it to: - textcat_max_languages N (default: 5) The maximum number of languages any one message can simultaneously match before its language classification is considered unknown. - Don't put loadplugin statements into your .cf files .cf files? I put it in user_prefs! -- my best guess as to how to use this jazz. *DEFINITELY* not in your user_prefs. The Mail::SpamAssassin::Conf document is quite clear that loadplugin is an administrator setting. No administrator settings should be in your user_prefs, as they will be ignored by spamd for security reasons, although a normal call to spamassassin will run them. However, I'm also going to suggest this bug be closed WONTFIX. Currently it is too cluttered with too many different issues to be usable for development purposes, and half aren't really bugs. I'd rather see this moved to a users list email discussion, and then separate bugs created for each of the problems that aren't configuration errors. Any devs have a preference? --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-14 05:05 --- --- With my long whitelist approach, it is not clear what textcat_max_languages N (default: 5) The maximum number of languages before the classification is considered unknown. means. --- It means exactly what the documentation says. Read that sentence carefully, and pay attention to the word classification that appears in it. This setting only applies to how SA classifies messages, it has nothing to do with config options. A message can potentially contain more than one language, thus match multiple languages during classification. This threshold tells SA how many languages can appear in one message before textcat should just decide that it is confused and classify the language of the message as unknown. IMHO, the documentation of that feature is sufficient. However, I can see how someone glancing through the docs could get confused, so if one of the devs wants to expand it, go for it. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-14 05:10 --- -- I got as far as loadplugin Mail::SpamAssassin::Plugin::TextCat ok_languages af am ar... add_header all languages _LANGUAGES_ but the fun ended with score UNWANTED_LANGUAGE_BODY 11 Don't put loadplugin statements into your .cf files, find it in the appropriate .pre file and uncomment it there. This one should be in v310.pre. The problem with putting a loadplugin in your local.cf is that the file gets read after the stock rule files, thus those files detect the plugin as not loaded, and the textcat rules get skipped. If you have usage questions, please direct them to the users list first, then put them into the bug if they happen to be actual bugs. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-13 18:51 --- Mention on the man page if one can use two lines, like whitelist_from: ok_languages af am ar be bg ca cs da de el en es fa fi fr he hi hr hu hy id it ok_languages ja ka ko mr ms ne nl no pl pt qu ro sk sq sr sv sw ta th tl tr uk vi zh or if they all must be on one line. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-13 18:59 --- Mail::SpamAssassin::Conf at ok_locales should say SEE ALSO Mail::SpamAssassin::Plugin::TextCat. Mail::SpamAssassin::Plugin::TextCat should mention SEE ALSO ok_locales on Mail::SpamAssassin::Conf. Else people will think... I thought I saw this earlier never realizing that there are now two similarly named ok_ things. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-13 19:22 --- By the way, I just realized your two whitelists, ok_locales and ok_languages, with no corresponding blacklists offered, will create a big problem for the user who uses my lenthly whitelist examples above in order to blacklist one or two items: One day if the e.g., english is split into british and american, the user won't be alert he has now inadvertently blacklisted english, unless you grandfather the pre-split one, etc... --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-13 19:49 --- DESCRIPTION This plugin will try to guess the language used in the message text. Say body, not text. That would be clearer. P.S., today I actually tried to use this plugin, but got score set for non-existent rule UNWANTED_LANGUAGE_BODY I got as far as loadplugin Mail::SpamAssassin::Plugin::TextCat ok_languages af am ar... add_header all languages _LANGUAGES_ but the fun ended with score UNWANTED_LANGUAGE_BODY 11 so please add some examples. Also add my add_header example above, lest the user spend an extra 15 minutes trying to figure it out. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-13 19:54 --- From _LANGUAGES_ on learns it detects to such detail as ru.windows-1251 but except for the two zh locales mention, nothing is documented about anything more than the two letter abbreviations. So mention one can match in greater detail... --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-13 20:05 --- (Sorry that a mail is sent for each of my discoveries. But if I save them up into one big posting, the ice cream truck will come by and I will never end up posting these.) With my long whitelist approach, it is not clear what textcat_max_languages N (default: 5) The maximum number of languages before the classification is considered unknown. means. Does it mean whitelists longer than 5 are meaningless by default. Document it please, don't just answer me. textcat_optimal_ngrams N (default: 0) Do say what ngrams means here, even though it is in Wikipedia. textcat_acceptable_score N (default: 1.05) Include any language that scores at least textcat_acceptable_score in the returned list of languages More mystery. Maybe one is supposed to use this instead of UNWANTED_LANGUAGE_BODY or something... all unclear. Add examples. Jimmy hates everything Russian, so he does the following... --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-10 06:31 --- Realistically, that's how TextCat works. If a language isn't ok then it gets points applied via the UNWANTED_LANGUAGE_BODY rule. So, It is not a whitelist, but rather a list of exceptions to a blacklist. I guess syntactically it would be easier if you could configure it using something like: ok_languages all except ru Rather than: ok_languages af am ar be bg ca cs da de el en es fa fi fr he hi hr hu hy id it ja ka ko mr ms ne nl no pl pt qu ro sk sq sr sv sw ta th tl tr uk vi zh Which is functionally the same assuming you haven't changed the inactive languages list. That said, I think most folks would be better off setting ok_languages to a list of languages that are really acceptable to them (ie: they're capable of reading it). This has some possibility of false-positives, but if the FP is in a language you can't read, does it really matter? --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
[Bug 5675] TextCat sidesteps 'what if I DON'T like language X?'
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5675 --- Additional Comments From [EMAIL PROTECTED] 2007-10-10 07:14 --- OK, do document the ok_languages af am ar be bg ... way to blacklist one language... ah, so that's how one (painfully) does it! if the FP is in a language you can't read, does it really matter? 9 times out of 10 a pal's mail will sail in in an unexpected language due to some odd character falling into his message or who knows, so one wishes not to risk it. Also new customers/contacts would not think to turn off their native language signatures, or default encoding Mail User Agent settings for an otherwise ASCII message... so odd, I never have any new customers / comments from abroad... of course: you blocked all their mail! So don't shoot first and ask questions later. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.