On 5/17/07, Jukka Zitting <[EMAIL PROTECTED]> wrote:
Hi,

On 5/17/07, robert burrell donkin <[EMAIL PROTECTED]> wrote:
> On 5/16/07, Jukka Zitting <[EMAIL PROTECTED]> wrote:
> > One possible approach, at the expense of storing potentially redundant
> > duplicate data, is that the original message source is stored as a
> > verbatim binary stream and the message content is automatically
> > "exploded" when the first client that actually needs to parse the
> > message.
>
> i really like this idea :-)

There's one major caveat with this approach: redundant information and
the performance cost of maintaining that.

yes

Maintaining updates in the raw message stream is in some (many?) cases
much more expensive than in a fully parsed representation. Consider
for example a mailet that wants to modify a subject line or add a
footer to all messages. Such operations would require that we update
the original message content as well as the individual header property
or body part in question. Updating the raw message source can in such
case easily take an order of magnitude more time than updating the
parsed representation.

this cost is only required if we choose to update the original

Note that I believe that it is possible to parse an incoming message
into a JCR node tree and recreate it back into a byte stream in the
same O(n) time and O(1) memory as is required to stream the raw
message source to a traditional spool file.

i suspect that nio -> file will be quicker but let's save this
argument and let the number decide

i didn't mean that intermediary spooling would be the only way but an
architecture that could support it would be worthwhile. being able to
use an intermediary spool file enables some designs which would not be
otherwise possible. for example, splitting the processing between two
instances. this would allow the email parsing and processing to be
done as non-root.

Perhaps we should have two modes for the JCR mail repository
implementation: one for pure relaying and one for more complex
processing. The former satisfies the relaying requirements of the SMTP
spec, while the latter is optimized for message transformations and
complex access patterns like in IMAP or webmail clients.

i don't think that two modes are necessary and it would be good if
this could be avoided . there is a danger that JAMES is drifting
towards become just a collection of unrelated protocol implementations
unless the data set is held together.

there is a case for retaining the original raw contents of a mail even
for rich patterns. this would allow better auditing and error
recovery.

exploding would work well when coupled with a changed flag. the
original message would be retained unaltered whatever the processing.
on demand, the original could be parsed and stored in a rich
representation. if the original cannot be parsed then the mail would
be marked.

if the mail is altered then a flag would be marked and the mail would
be reconstructed on demand from the rich representation. if the
message has not been transformed then the raw original can be used
directly.

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to