Dear Wiki user, You have subscribed to a wiki page or wiki category on "James Wiki" for change notification.
The "GSOC2011" page has been changed by EricCharles. http://wiki.apache.org/james/GSOC2011?action=diff&rev1=2&rev2=3 -------------------------------------------------- == Design and implement a distributed mailbox using Hadoop == The mailbox subproject (http://james.apache.org/mailbox/) supports maildir, SQL database (via JPA) and Java Content Repository (JCR) as technology for mail storage. This flexibility is achieved thanks to a api design that allows to abstract the mail storage from the protocols, allowing to implement We need to better support distributed storage such as Hadoop HDFS. The James mailbox API will be used. A first step will be to design how to interact with Hadoop (native api, gora incubator at apache,...). The second step will be to implement the HDFS mailbox (maildir mailbox is similar and can be an inspiration). A single james server will still be deployed because we don't have any distributed UID generation. - == Distributed UID generation == + == Design and implement Distributed UID generation == IMAP4 need to generate incremental UID. This is now achieved in James IMAP subproject (http://james.apache.org/imap) with a UidProvider interface which is implemented in memory, but which does not allow distributed working of the solution. A DistributedUidProvider must be designed (based on distributed memory cache such as hazelcast for example, or any other solution), and implemented. == Design and implement machine learning filters and categorization for mail == Anti-spam functionality based on SpamAssassin is available at James (base on mailets http://james.apache.org/mailet). Bayesian mailets are also available, but not completely integrated/documented. We are willing to align the existing implementation with any modern anti-spam solution based on powerfull machine learning implementation (such as apache mahout). We are also willing to extend the machine learning usage to some mail categorization (spam vs not-spam is a first category, we can extend it to any additional category we can imagine). + + The implementation can partially occur while spooling the mails and/or when mail is stored in mailbox. + + See also discussions on mail intelligent mining on http://markmail.org/message/2bodrwvdvtfq3f2v (mahout related) and http://markmail.org/thread/pksl6csyvoeo27yh (hama related). == Implement additional SIEVE RFCs == [fill in description]