[James Wiki] Update of "GSOC2011" by EricCharles

Apache Wiki Tue, 29 Mar 2011 08:30:37 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "James Wiki" for change 
notification.


The "GSOC2011" page has been changed by EricCharles.
http://wiki.apache.org/james/GSOC2011?action=diff&rev1=4&rev2=5

--------------------------------------------------

  = Ideas for James at Google Summer of Code 2001 =
- '''Context''': James (Java Application Mail Server) is a set of mail-related 
libraries bundled in a server (http://james.apache.org). It supports standard 
mails protocols (smtp, pop3, imap4) and can store the mails in different 
technologies (maildir, database, jcr). We are looking for Students to help add 
more distributed storages (hadoop and nosql). We are also looking for Students 
to bring end-user with more functionality such as mails filter/categorization 
and "out-of-office"
+ '''Context''': James (Java Application Mail Server) is a set of mail-related 
libraries bundled in a server (http://james.apache.org). It supports standard 
mails protocols (smtp, pop3, imap4) and can store the mails in different 
technologies (maildir, database, jcr).
+ 
+ '''Students''': We are looking for Students to help add more distributed 
storages (hadoop and nosql). We are also looking for Students to bring end-user 
with more functionality such as mails filter/categorization and 
"out-of-office.A good knowledge of JAVA programming language is required.  
Knowledge of email protocols and nosql storage systems is welcome (but not 
required).
  
  ----
  == Design and implement a distributed mailbox using Hadoop ==
- '''Context:''' The mailbox subproject (http://james.apache.org/mailbox/) 
supports maildir, SQL database (via JPA) and Java Content Repository (JCR) as 
technology for mail storage. This flexibility is achieved thanks to a api 
design that allows to abstract the mail storage from the protocols, allowing to 
implement
+ '''Context:''' The mailbox subproject (http://james.apache.org/mailbox/) 
supports maildir, SQL database (via JPA) and Java Content Repository (JCR) as 
technology for mail storage. This flexibility is achieved thanks to a API 
design that abstracts mail storage from the mail protocols.
  
- '''Mentor''': eric at apache dot org
+ '''Task''': We need to implement mailbox storage as a distributed system on 
top of Hadoop HDFS. The James mailbox API will be used. A first step is to 
design how to interact with Hadoop (native api, gora incubator at apache,...) 
and deal with specific performance questions related to mail loading/parsing in 
a distributed system (use map/reduce or not, use existing local lucene indexes 
for search,...). The second step is to implement the HDFS mailbox (maildir 
mailbox is similar because is stores mails as a file and can be an 
inspiration). A single James server will still be deployed because we don't 
have any distributed UID generation.
  
- '''Task''': We need to better support distributed storage such as Hadoop 
HDFS. The James mailbox API will be used. A first step will be to design how to 
interact with Hadoop (native api, gora incubator at apache,...). The second 
step will be to implement the HDFS mailbox (maildir mailbox is similar and can 
be an inspiration). A single james server will still be deployed because we 
don't have any distributed UID generation.
+ '''Mentor''': eric at apache dot org & [fill in mentor]
+ 
+ '''Complexity''': medium
  
  ----
  == Design and implement Distributed UID generation ==
- '''Context''': IMAP4 need to generate incremental UID. This is now achieved 
in James IMAP subproject (http://james.apache.org/imap) with a UidProvider 
interface which is implemented in memory, but which does not allow distributed 
working of the solution.
+ '''Context''': IMAP4 need to generate incremental UID. This is now achieved 
in James IMAP subproject (http://james.apache.org/imap) with a UidProvider 
interface implemented in memory. This implementation does not allow distributed 
working of the solution.
  
- '''Mentor''': eric at apache dot org
+ '''Task''': A DistributedUidProvider must be designed. The design can rely on 
a distributed memory cache such as hazelcast , or any other solution (hadoop, 
hbase, cassandra,...), and implemented.
  
- '''Task''': A DistributedUidProvider must be designed (based on distributed 
memory cache such as hazelcast for example, or any other solution), and 
implemented.
+ '''Mentor''': eric at apache dot org  & [fill in mentor]
+ 
+ '''Complexity''': medium
  
  ----
  == Design and implement machine learning filters and categorization for mail 
==
- Context: Anti-spam functionality based on SpamAssassin is available at James 
(base on mailets http://james.apache.org/mailet). Bayesian mailets are also 
available, but not completely integrated/documented. We are willing to align 
the existing implementation with any modern anti-spam solution based on 
powerfull machine learning implementation (such as apache mahout). We are also 
willing to extend the machine learning usage to some mail categorization (spam 
vs not-spam is a first category, we can extend it to any additional category we 
can imagine).
+ '''Context''': Anti-spam functionality based on SpamAssassin is available at 
James (base on mailets http://james.apache.org/mailet). Bayesian mailets are 
also available, but not completely integrated/documented. Nothing is available 
to automatically categorize mail traffic per user.
  
- '''Mentor''': eric at apache dot org
+ '''Task''': We are willing to align the existing implementation with any 
modern  anti-spam solution based on powerfull machine learning implementation  
(such as apache mahout). We are also willing to extend the machine  learning 
usage to some mail categorization (spam vs not-spam is a first  category, we 
can extend it to any additional category we can imagine).  The implementation 
can partially occur while spooling the mails and/or when mail is stored in 
mailbox.
  
- '''Task''': The implementation can partially occur while spooling the mails 
and/or when mail is stored in mailbox. See also discussions on mail intelligent 
mining on http://markmail.org/message/2bodrwvdvtfq3f2v (mahout related) and 
http://markmail.org/thread/pksl6csyvoeo27yh (hama related).
+ '''Related discussions''': See also discussions on mail intelligent mining on 
http://markmail.org/message/2bodrwvdvtfq3f2v (mahout related) and 
http://markmail.org/thread/pksl6csyvoeo27yh (hama related).
+ 
+ '''Mentor''': eric at apache dot org & [fill in mentor]
+ 
+ '''Complexity''': high
  
  ----
  == Implement additional SIEVE RFCs ==
@@ -39, +49 @@

  == Add "out-of-office" functionality ==
  '''Context''': A frequently asked function is to have the ability to set per 
user a "out-of-office". In that case, the sender will automatically receive a 
default mail saying the recipient is not there.
  
- '''Mentor''': ...
+ '''Mentor''': [fill in mentor]
  
- '''Task''': The API and implementation must be defined (based on a mailet or 
not).  The way the end-user will set/unset his "out-of-office" as the message  
that will be send must also be imagined (via hupa webmail for example).
+ '''Complexity''': medium
  
+ '''Task''': The API and implementation must be defined (based on a mailet or 
not).  The way the end-user will set/unset his "out-of-office" as the message  
that will be send must also be imagined (via James HUPA webmail for example).
+ 
+ ----
+

[James Wiki] Update of "GSOC2011" by EricCharles

Reply via email to