Zaharioudakis Nikos <[EMAIL PROTECTED]> writes:
> Hello everybody, I was thinking if there was an already established
> project for greek word definitions. Do I have to do something special
> about the greek encoding. ?` If not I am really interested in starting
> one. !! Any other Greeks arround in the list ??
I'm not aware of one. I checked my mail archive going back over a year
and didn't find any SpamAssassin messages mentioning "greek" talking
about translating definitions.
Translators welcome. Ask on -dev if you're interested. Basically, we'd
need you to submit an ASF Contributor License Agreement and send us a
translations file against 3.0 via bugzilla.
We could also use someone from Greece in our group of people who submit
mass-check results on a nightly basis. This is from a message I posted
a while ago:
------- start of cut text --------------
Okay, the SpamAssassin development team could use some help from savvy
SpamAssassin users who send and receive email in non-English languages.
We have some non-English mail in our corpuses, but more would be better.
In particular, I'm looking for non-Latin alphabet languages and character
sets that are (unfortunately) frequently seen in spam. (No cracks about
the prevalence of English language spam, please. :-)
The help I'm seeking is for people to develop both spam and ham (non-spam)
corpuses of their email using "real life" email. By "real life", I mean a
representative sample of email, not just mailing lists and consisting of
roughly 25% to 75% ham (including a significant portion of *ham* in
languages other than English). If your primary email address ends in
something other than .com, .org, .net, .edu, .uk, .au, .nz, .za, etc. that
would be even better.
If you follow the SpamAssassin-talk mailing list, are capable of
maintaining spam and ham corpuses, have some familiarity with Perl, rsync,
and CVS, then this may be a way in which you can contribute. You need to
be familiar with those tools because you won't be submitting your email
(so your privacy is maintained), but the output of our "mass-check"
program containing condensed results of SpamAssassin on a (hopefully)
nightly basis. If this sounds intimidating, then this is probably not the
best way for you to contribute.
In particular, I'm looking for mass-check contributions to cover the
following areas, but if you're interested and willing to follow the
guidelines correctly, I don't think I'll be too picky about the languages
that are being contributed.
- Russian/Cyrillic windows-1251 and koi8-r
- languages that use Windows-1252
- Chinese big5 and gb2312
- Korean euc-kr/ks_c_5601-1987
- languages with non-Latin iso-8859 alphabets (Cyrillic, Arabic, Greek,
Hebrew, Thai)
- Japanese iso-2022-jp
- Latin iso-8859 alphabets (not as critical)
If you're interested, here's what you'll need to do:
1. develop spam and ham corpuses containing spam less than 6 months old
and ham less than 12 months old.
- your corpus must follow the basic content policy described in
masses/CORPUS_POLICY
- use the process described in masses/CORPUS_SUBMIT for making sure
your ham and spam is clean
2. check out the SpamAssassin CVS tree using the
CURRENT_CORPORA_SUBMIT_VERSION tag.
3. use the procedure in masses/CORPUS_SUBMIT_NIGHTLY to submit your
results (just turn off Bayes auto-learning and network checks)
4. Everyone will need to have a login/password from Craig Hughes to rsync
their results, but let's get some results going before we worry about
the rsync stuff.
Please post a reply to this message (to the list) if you have any
questions.
------- end ----------------------------
--
Daniel Quinlan anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/ and open source consulting