Thank Eric, In the motivation part, I mention this ml approach will improve the accuracy for the filtering and efficiency with the predictive model building process.
I already add the data preparation process in the proposal. Vicki On Fri, Apr 8, 2011 at 10:47 AM, Eric Charles (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/JAMES-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017477#comment-13017477 > ] > > Eric Charles commented on JAMES-1216: > ------------------------------------- > > Vicki, > From your mail > "It still need to get the training dataset from manually judge data > first.because this machine learning algorithm still need to learn what kind > of email is spam, do the feature analysis and build the predictive model. > The new approach can share the spam/non spam training dataset with naive > Bayesian. > " > I would make that point clear, in your application (it may be obvious to your > or to people used to such matter, but better said than not) > > Also please make clear directly in the preamble the added-value of your new > solution compared the existing implementation (eg: better identification of > spam?, shorter learning period?, less false-positive?, open to other > categorization? ...).. > > >> [gsoc2011] Design and implement machine learning filters and categorization >> for mail >> ------------------------------------------------------------------------------------ >> >> Key: JAMES-1216 >> URL: https://issues.apache.org/jira/browse/JAMES-1216 >> Project: JAMES Server >> Issue Type: New Feature >> Reporter: Eric Charles >> Assignee: Eric Charles >> Labels: gsoc2011 >> >> Context: Anti-spam functionality based on SpamAssassin is available at James >> (base on mailets http://james.apache.org/mailet). Bayesian mailets are also >> available, but not completely integrated/documented. Nothing is available to >> automatically categorize mail traffic per user. >> Task: We are willing to align the existing implementation with any modern >> anti-spam solution based on powerfull machine learning implementation (such >> as apache mahout). We are also willing to extend the machine learning usage >> to some mail categorization (spam vs not-spam is a first category, we can >> extend it to any additional category we can imagine). The implementation can >> partially occur while spooling the mails and/or when mail is stored in >> mailbox. >> Related discussions: See also discussions on mail intelligent mining on >> http://markmail.org/message/2bodrwvdvtfq3f2v (mahout related) and >> http://markmail.org/thread/pksl6csyvoeo27yh (hama related). >> Mentor: eric at apache dot org & [fill in mentor] >> Complexity: high > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: http://www.atlassian.com/software/jira > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Yu Fu [email protected] 443-388-6654 --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
