[ 
https://issues.apache.org/jira/browse/JAMES-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017477#comment-13017477
 ] 

Eric Charles commented on JAMES-1216:
-------------------------------------

Vicki,
>From your mail
"It still need to get the training dataset from manually judge data 
first.because this machine learning algorithm still need to learn what kind
of email is spam, do the feature analysis and build the predictive model.
The new approach can share the spam/non spam training dataset with naive 
Bayesian.
"
I would make that point clear, in your application (it may be obvious to your 
or to people used to such matter, but better said than not)

Also please make clear directly in the preamble the added-value of your new 
solution compared the existing implementation (eg: better identification of 
spam?, shorter learning period?, less false-positive?, open to other 
categorization? ...)..


> [gsoc2011] Design and implement machine learning filters and categorization 
> for mail
> ------------------------------------------------------------------------------------
>
>                 Key: JAMES-1216
>                 URL: https://issues.apache.org/jira/browse/JAMES-1216
>             Project: JAMES Server
>          Issue Type: New Feature
>            Reporter: Eric Charles
>            Assignee: Eric Charles
>              Labels: gsoc2011
>
> Context: Anti-spam functionality based on SpamAssassin is available at James 
> (base on mailets http://james.apache.org/mailet). Bayesian mailets are also 
> available, but not completely integrated/documented. Nothing is available to 
> automatically categorize mail traffic per user.
> Task: We are willing to align the existing implementation with any modern 
> anti-spam solution based on powerfull machine learning implementation (such 
> as apache mahout). We are also willing to extend the machine learning usage 
> to some mail categorization (spam vs not-spam is a first category, we can 
> extend it to any additional category we can imagine). The implementation can 
> partially occur while spooling the mails and/or when mail is stored in 
> mailbox.
> Related discussions: See also discussions on mail intelligent mining on 
> http://markmail.org/message/2bodrwvdvtfq3f2v (mahout related) and 
> http://markmail.org/thread/pksl6csyvoeo27yh (hama related).
> Mentor: eric at apache dot org & [fill in mentor]
> Complexity: high 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to