Re: [jira] [Issue Comment Edited] (JAMES-1216) [gsoc2011] Design and implement machine learning filters and categorization for mail

Vicki Fu Sun, 24 Apr 2011 21:23:03 -0700

Hi Robert and Eric,
I am sorry to say that I can not participate this gsoc because my
mentor suggest me to focus my research defense first.
I am sorry for quitting this opportunity to contribute JAMES. Since
the AI project is on the track,
maybe I can help for recommendation, keywords filter to pre-read mail
or news later.
Thank you for all of your suggestion on my proposal. I will be happy
that anyone can use my proposal to continue this project.
Thanks.
Vicki



On Wed, Apr 6, 2011 at 4:56 PM, Robert Burrell Donkin (JIRA)
<[email protected]> wrote:
>
>    [ 
> https://issues.apache.org/jira/browse/JAMES-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016521#comment-13016521
>  ]
>
> Robert Burrell Donkin edited comment on JAMES-1216 at 4/6/11 8:56 PM:
> ----------------------------------------------------------------------
>
> Feature Selection
> -------------------------
> Feature extraction from emails may potentially result in a large number of 
> features, and so high dimensionality.
>
> For some algorithms, this may have undesirable performance consequences. For 
> example, k-nearest neighbour implementations typically hold all training data 
> in memory during classification, and computes distances between the test 
> point and each training point. To understand this trade-off, it would be 
> useful to estimate how memory and computation complexity scales with the 
> number of features, and relate this to desired mail throughput.
>
> A strong GSOC application should probably consider feature selection, so that 
> it can be factored into the design even if time does not allow a full 
> implementation.
>
> "An Introduction To Variable and Feature Selection" by Guyon and Elisseef; 
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.85.3593&rep=rep1&type=pdf
> "Fast Binary Feature Selection with Conditional Mutual Information" by 
> Fleuret; 
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.60.8398&rep=rep1&type=pdf
>
>      was (Author: robertburrelldonkin):
>    Feature Selection
> -------------------------
> Feature extraction from emails may potentially result in a large number of 
> features, and so high dimensionality. For some algorithms
>
>> [gsoc2011] Design and implement machine learning filters and categorization 
>> for mail
>> ------------------------------------------------------------------------------------
>>
>>                 Key: JAMES-1216
>>                 URL: https://issues.apache.org/jira/browse/JAMES-1216
>>             Project: JAMES Server
>>          Issue Type: New Feature
>>            Reporter: Eric Charles
>>            Assignee: Eric Charles
>>              Labels: gsoc2011
>>
>> Context: Anti-spam functionality based on SpamAssassin is available at James 
>> (base on mailets http://james.apache.org/mailet). Bayesian mailets are also 
>> available, but not completely integrated/documented. Nothing is available to 
>> automatically categorize mail traffic per user.
>> Task: We are willing to align the existing implementation with any modern 
>> anti-spam solution based on powerfull machine learning implementation (such 
>> as apache mahout). We are also willing to extend the machine learning usage 
>> to some mail categorization (spam vs not-spam is a first category, we can 
>> extend it to any additional category we can imagine). The implementation can 
>> partially occur while spooling the mails and/or when mail is stored in 
>> mailbox.
>> Related discussions: See also discussions on mail intelligent mining on 
>> http://markmail.org/message/2bodrwvdvtfq3f2v (mahout related) and 
>> http://markmail.org/thread/pksl6csyvoeo27yh (hama related).
>> Mentor: eric at apache dot org & [fill in mentor]
>> Complexity: high
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>



-- 
Vicki Fu

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [jira] [Issue Comment Edited] (JAMES-1216) [gsoc2011] Design and implement machine learning filters and categorization for mail

Reply via email to