Re: Questions on the Mail and MailRepository interfaces

robert burrell donkin Wed, 16 May 2007 14:28:34 -0700

On 5/16/07, Jukka Zitting <[EMAIL PROTECTED]> wrote:

Hi,

On 5/16/07, Stefano Bagnara <[EMAIL PROTECTED]> wrote:
> Jukka Zitting ha scritto:
> > A message consists of a envelope and the contained message. In JCR
> > this is represented as the james:mail subclass of the standard nt:file
> > node type (see http://wiki.apache.org/jackrabbit/nt%3afile):
> >
> >    [james:mail] > nt:file
> >    - james:state (STRING)
> >    - james:error (STRING)
> >    - james:sender (STRING)
> >    - james:recipients (STRING) multiple
> >    - james:remotehost (STRING)
> >    - james:remoteaddr (STRING)
> >    - jamesattr:* (UNDEFINED)
>
> If we move to MessageRepository (JCR based) + EnvelopeRepository (JMS
> based) model then we don't need the state, error, sender, recipients,
> remotehost, remoteaddr, attributes stuff in the message repository.

OK. Currently I'm just trying to store everything specified by the
Mail interface, but modifying the content model won't be a problem. In
fact I placed the envelope information on the nt:file parent node on
purpose to avoid having them mixed with the message stuff in the
content node.

> Instead we may need some IMAP stuff in the MessageRepository (for the
> IMAP stuff you may be interested in this document written by Joachim
> months ago: http://www.joachim-draeger.de/JamesImap/drafts.html )

I'll give it a look...

> > [..]
> > Normal mail messages are represented as a tree of MIME entities or
> > parts. Each entity is individually referenceable (for easy linking and
> > quick access) and contains associated the mail headers as string
> > attributes:
> > [...]
> > I'm still undecided on how deep I should go in pre-parsing the message
> > contents. For example should I parse Date headers and store them as
> > JCR DATE properties to enable efficient date-based queries? Another
> > complex question is how to best handle encryption and digital
> > signature mechanisms like S/MIME...
>
> I'm not sure at all that the backend should be aware of the
> content/structure of the message.

I guess that depends on the requirements. If you're only interested in
having a dumb message store that just passes messages back and forth
as-is, then not parsing them is a good idea. But if you want to be
able to efficiently search, manage, and manipulate the messages inside
the repository, then understanding the content structure makes very
much sense. A good requirement that I'm trying to achieve is the IMAP
feature of selectively downloading parts of a multipart message. I
wouldn't want to have to parse the entire multipart message over and
over again to serve such client requests.

More generally, I guess the question is whether you see the James mail
repository as just a transient space where the message resides for a
while until it is either forwarded via SMTP or retrieved over POP.
What I'm trying (at least for now) to achieve is a more persistent
mail storage that is actually used as the *endpoint* of the email
delivery and accessed in-place through interfaces like IMAP or a
webmail client. Perhaps there's some reasonable common ground?

> The MOST IMPORTANT thing at all is that if I store a message and I later
> retrieve it every single space, every single header, everything is
> exactly as I wrote it. Even if it was malformed.

Is this a hard requirement? If yes, then I could just model the entire
mime message as a normal nt:resource node, in which case the JCR
repository would act just like an advanced file system with
transactions and some search features.


IMAP is *VERY* sensitive about malformed messages: a MIME message
*MUST* be well formed. it's all too easy to crash modern IMAP clients
with malformed emails.

one approach would be to take advantage of the typing available in
JCRs to help the server understand mail. malformed MIME could
gracefully degrade to RFC822 and malformed RFC882 to a general mail
type.

Personally I don't see the exact storage requirement as essential, as
the mail specs explicitly allow all sorts of intermediate nodes to
perform various types of reformattings on messages while in transit.
Things should be fine as long as the original intended content is
preserved.


it's important to be able to get the raw as well as the processed

it's good being able to have smooth access to a parsed set of
addresses but the raw header also needs to be preserved

> To achieve performance we'll probably have to avoid parsing the mime
> structure at all: we don't need this for most SMTP/POP3 operations. Some
> IMAP operation needs this, but this should probably done on demand and
> not when writing the message to the repository.

One possible approach, at the expense of storing potentially redundant
duplicate data, is that the original message source is stored as a
verbatim binary stream and the message content is automatically
"exploded" when the first client that actually needs to parse the
message.


i really like this idea :-)

there is no need to wait until the first client with SEDA: exploding
would just another task to execute

IMAP is write rarely and read regularly. unless MIME messages are
parsed and stored as separate parts, performance will be very poor in
normal operation.

but again, the key is to be able to access the original, raw data when needed

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Questions on the Mail and MailRepository interfaces

Reply via email to