On Fri, Jun 27, 2008 at 9:19 AM, Stefano Bagnara <[EMAIL PROTECTED]> wrote:
> Robert Burrell Donkin ha scritto:
>>
>> On Tue, Jun 24, 2008 at 7:40 PM, Stefano Bagnara <[EMAIL PROTECTED]> wrote:

<snip>

>>> Sometimes my JAMES deal with around a million email in a day and I had to
>>> "hack" some JAMES code to avoid writing to disk/jdbc on each status
>>> change
>>> (custom processorClass for the spoolmanager).
>>>
>>> A performance oriented approach would be to directly use the KAHA engine
>>>  [1] (or maybe the AMQ Message Stores [2]) o to write stuff only for
>>> reliability but keep using memory operations unless the queue grow too
>>> much,
>>> but I think this should be kept in our mind when we'll refactor the
>>> repository APIs (let's not close this door when we change the api)
>>
>> should be easy enough if we go down the MOM route: messaging intefaces
>> tend to be concise which allows multiple implementations
>
> Issues:
> It would be cool to be able to declare some processing to be done before
> finally declaring the mail as accepted. Specification say we should reply as
> fast as possible after the DATA ending "CRLF.CRLF" to avoid duplicate sent,
> but I think many times 0.1 seconds are enough to run some processing and
> "fast fail" instead of spooling.
> It would be even better to say: start the spooling process, track the status
> for 0.1 seconds and then replies to the DATA.

IMHO the simple spool-accept-fail paradigm needs to be re-considered.
MOM would allow more flexible and finely grained processing.

a lot of processing could be done using only the information in the
headers. once the headers have been received then processing could
start whilst the rest of the message is being read.

> I say this because I think that we need message tracking during the spool in
> order to support delivery status notification at the container level. JAMES
> currently support a limited part of DSNs (bounces) at the mailet level but
> this is not enough (IMHO).
>
> The container should be made aware of messages being delivered, messages
> being expanded, delayed, failed, aliased in order to provide better tracking
> (both for DSN and logging purposes).
>
> They're 3 years I think to this from time to time, but I never found a
> satisfying solution, yet.

there are some basic verbs associated with mail processing. some of
these i think would be more better represented by queues. for example,
remote delivery or local delivery could be managed by queues rather
than mailets directly.  i think that this approach should allow high
level tracking without sacrificing the flexibility that mailets bring.

>>> It would be cool if a standard SMTP=>MultipleMailets=>RemoteDelivery
>>> processing could be done with a single write to disk when the mail is
>>> accepted and no other read/write (unless memory/queues are full).
>>
>> unless you're willing to risk losing messages, i don't see how this
>> can be done. it should be possible to just write twice: once when the
>> message arrives and once when processing is done.
>
> "done marks" could be applied in batches when the traffic is really high.
> But maybe this is premature optimization.

yes

> If we can get it writing twice it would be already a big improvement.

i think two writes would a realistic goal to target

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to