[ 
https://issues.apache.org/jira/browse/MAILBOX-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016477#comment-13016477
 ] 

Robert Burrell Donkin commented on MAILBOX-44:
----------------------------------------------

The Structure Of An Mail 
------------------------------------
Numerous RFCs describe the structure which emails should have. Though in the 
wild, wild web variations are encountered, it's important to read these 
standards to start to understand the data structure used by mail. 

Take a look at the Mime4J mail parser 
(http://james.apache.org/mime4j/index.html) and here's a selection of RFC to 
skim:

http://tools.ietf.org/html/rfc5322
http://tools.ietf.org/html/rfc5335
(and for historic reasons also: 
 http://tools.ietf.org/html/rfc5335 
 http://tools.ietf.org/html/rfc2822
 http://tools.ietf.org/html/rfc822)

http://tools.ietf.org/html/rfc2045
http://tools.ietf.org/html/rfc2184
http://tools.ietf.org/html/rfc2231
http://tools.ietf.org/html/rfc2046
http://tools.ietf.org/html/rfc2646
http://tools.ietf.org/html/rfc3676
http://tools.ietf.org/html/rfc3798
http://tools.ietf.org/html/rfc5147

> [gsoc2011] Design and implement a distributed mailbox using Hadoop
> ------------------------------------------------------------------
>
>                 Key: MAILBOX-44
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-44
>             Project: James Mailbox
>          Issue Type: New Feature
>            Reporter: Eric Charles
>            Assignee: Norman Maurer
>              Labels: gsoc2011
>
> Context: The mailbox subproject (http://james.apache.org/mailbox/) supports 
> maildir, SQL database (via JPA) and Java Content Repository (JCR) as 
> technology for mail storage. This flexibility is achieved thanks to a API 
> design that abstracts mail storage from the mail protocols.
> Task: We need to implement mailbox storage as a distributed system on top of 
> Hadoop HDFS. The James mailbox API will be used. A first step is to design 
> how to interact with Hadoop (native api, gora incubator at apache,...) and 
> deal with specific performance questions related to mail loading/parsing in a 
> distributed system (use map/reduce or not, use existing local lucene indexes 
> for search,...). The second step is to implement the HDFS mailbox (maildir 
> mailbox is similar because is stores mails as a file and can be an 
> inspiration). A single James server will still be deployed because we don't 
> have any distributed UID generation.
> Mentor: eric at apache dot org
> Complexity: medium 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to