[jira] [Commented] (MAILBOX-44) [gsoc2011] Design and implement a distributed mailbox using Hadoop

Norman Maurer (JIRA) Wed, 15 Jun 2011 11:41:57 -0700

    [ 
https://issues.apache.org/jira/browse/MAILBOX-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049949#comment-13049949
 ]


Norman Maurer commented on MAILBOX-44:
--------------------------------------

@Stack:

Hope this makes it more clear:

messagesMetaData(CF): {
  mailboxId/uid: {
    uid: 1,
    mailboxId: 184e-ske1-igk2-gj71
    flags.recent: true,
    flags.deleted: true,
    flags.seen: true,
    flags.deleted: false,
    flags.seen: false,
    flags.flagged: true,
    bodyOctets: 19484
    fullContentOctets: 10304
    properties: namespace::localname::value;;namespace2::localname2::value2
    headers: byte[],
    mediaType: text,
    subType: plain,
   textualLineCount: 24
  }
}

messagesContent(CF): {
  mailboxId/uid: {
    1: byte[],
    2: byte[],
    3: byte[]
  }
}

The I have secondary indexes on the messagesMetaData CF to be able to get all 
messages which belongs to mailbox X and have the deleted flag set etc.

I used RP and used the secondary indexes for "filter" the right messages.

Does it explain it a bit more ?

> [gsoc2011] Design and implement a distributed mailbox using Hadoop
> ------------------------------------------------------------------
>
>                 Key: MAILBOX-44
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-44
>             Project: James Mailbox
>          Issue Type: New Feature
>            Reporter: Eric Charles
>            Assignee: Norman Maurer
>              Labels: gsoc2011
>             Fix For: 0.3
>
>
> Context: The mailbox subproject (http://james.apache.org/mailbox/) supports 
> maildir, SQL database (via JPA) and Java Content Repository (JCR) as 
> technology for mail storage. This flexibility is achieved thanks to a API 
> design that abstracts mail storage from the mail protocols.
> Task: We need to implement mailbox storage as a distributed system on top of 
> Hadoop HDFS. The James mailbox API will be used. A first step is to design 
> how to interact with Hadoop (native api, gora incubator at apache,...) and 
> deal with specific performance questions related to mail loading/parsing in a 
> distributed system (use map/reduce or not, use existing local lucene indexes 
> for search,...). The second step is to implement the HDFS mailbox (maildir 
> mailbox is similar because is stores mails as a file and can be an 
> inspiration). A single James server will still be deployed because we don't 
> have any distributed UID generation.
> Mentor: eric at apache dot org
> Complexity: medium 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (MAILBOX-44) [gsoc2011] Design and implement a distributed mailbox using Hadoop

Reply via email to