Re: Decision required for JAMES-603

Stefano Bagnara Sat, 02 Sep 2006 10:20:19 -0700

Noel J. Bergman wrote:

The following is a summary of the problem.


  1) It occurs ONLY when using JDBCSpoolRepository for RemoteDelivery
  2) If there are more items in the spool than fit in the cache, it is
     possible to delay delivery for messages that ought to be delivered.
  3) If iterating through the cache takes more than one second, it is
     possible to spinloop.

I'm investigating further on the problem. I had this again today, evenif I raised the maxcache to 10000 and I had less than 10000 messages. Sosomething weird is happening and I have to check it better.Furthermore I'm under the impression that I have a similar issue also onthe main spool manager... Maybe there are multiple problems so I have tofix some of them to check for the others.

There are a variety of approaches.  One is to fix it.  So far neither
Stefano nor I (not that I've had much time to look, but he spent all day on
it), have come up with a trivial fix.  The types of fixes for this code
would push back release for weeks.  At that point I might as well implement
the right long-term change, planned for the next release, rather than a
one-off bandaid to resolve v2.3.

The long term change needs a db change and we decided to keep dbstructure unchanged until 3.0 so imho we need a fix for 2.3 and 2.4 thatdoesn't include changing the db to replace the last_updated withnext_processing_time or something similar.

Alternatively, we could add a configuration parameter for the hardcoded
timeout value (there is already one for the cache size), document the
potential problem, and release JAMES v2.3.

Imho the problem is not the timeout: the timeout is there to avoid thatall the threads run the same query on the repository when there are nomessages. Without timeout you would need 50 queries to decide that youhave nothing to do, with the timeout this is fixed. Increasing thetimeout is an hack and would work only because we already have an hackthat our threads wake up every 60 seconds (we don't do this for filerepositories that works better regarding to this issue).

I do not want to just remove the cache, which is one of Stefano's
suggestions.  The cache prevents JAMES from crashing when the message
arrival rate is higher than it can process.  Throwing OOMs and possibly
discarding messages in the process is not acceptable.

I think that now we have a behaviour that is buggy and difficult tounderstand and to solve. I want to have it fixed on my system beforedeciding what to do with 2.3.0.

And my preferred solution, now, is not the removal of the cache but acomplete rewrite of the chaching algorythm and the accept mechanismwithout changing the db.

Recognize that part of the problem is the conflating of the RemoteDelivery
spool and the main pipeline spool, which have different requirements, since
the former applies scheduling on top of the spool.  Again, that's on the
roadmap to change, but wasn't planned for v2.3.

        --- Noel


Well, we have a bug and we may need to change the original plan.

I still think there is something more about this issue to be discoveredso I will talk about possible solutions later, when I'll haveinvestigated a little more on this hard issue.


Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Decision required for JAMES-603

Reply via email to