Hello Jean, I do support merging https://github.com/apache/james-project/pull/282 along with https://github.com/apache/james-project/pull/286 as soon as we get a green build.
I did finally succeed to run an SMTP benchmark against a simple workload mixing small and big messages, the outcome was good, eventually answering people wanting Gatling performance tests to complete JMH benchmarks: https://github.com/apache/james-project/pull/286#issuecomment-780221455 Cheers, Benoit Le 15/02/2021 à 14:58, Jean Helou a écrit : > Hi everyone, > > In light of benoit's latest comments on the related PRs, I want to report > that I spent some time setting up an infra to do benchmarks against a full > server (mx only) I'm not an ops at core, I don't know james all that well > and spare time is limited and I got sidetracked quite a few times, so I > still haven't run the benchmarks but I'm getting there. > > I am strongly in favor of integrating at least #282. > > #285 is nice to have for me but I won't have time to investigate the config > file option rapidly. > > Jean > > Le jeu. 7 janv. 2021 à 18:04, Raphaël Ouazana <[email protected]> a > écrit : > >> Hi, >> >> I completely agree with what Matthieu said. >> >> Correctness is needed, so we have to fix this issue. >> >> But before merging on master and/or deploying this version I would >> expect some Gatling runs to show which potential performance decrease we >> can expect on real use cases. >> >> Thanks, >> >> Raphaël. >> >> Le 04/01/2021 à 14:42, Matthieu Baechler a écrit : >>> Hi there, >>> >>> Thank you for bringing this topic to the mailing-list. >>> >>> To me safety and correctness is much more important than raw >>> performance so I would like the always-copy implementation to replace >>> the COW version. >>> >>> However, keep in mind that the JMH benchmark figures did not told the >>> full story about the consequences of this change and be ready to >>> experience slower real-world performances. >>> >>> Cheers, >>> >>> -- Matthieu Baechler >>> >>> On Tue, 2020-12-29 at 12:54 +0700, Tellier Benoit wrote: >>>> Hello there! >>>> >>>> We had been discussing on GitHub recently about an optimization in >>>> james >>>> core around the usage of MimeMessage. >>>> >>>> Javax.mail MimeMessage is currently used to represent a message of an >>>> email as part of the mail processing in James. It is part of the Mail >>>> interface (mailet-api). >>>> >>>> As Mail envelope is composed of several recipients, mail related >>>> operations are performed once for all these recipients (we enqueue >>>> the >>>> mail one time, we strip bcc one time etc...). Troubles arise when we >>>> need different behaviors as part of mail processing across recipients >>>> (think remote recipients, that needs there mail to be relayed, versus >>>> local recipients that needs to be locally delivered). The email get's >>>> duplicated (in MatcherSplitter) and the processing will then be >>>> distinct >>>> for both entities. The underlying MimeMessage may - or may not be >>>> modified. >>>> >>>> In order to prevent MimeMessage duplication in the event the >>>> underlying >>>> MimeMessage is not modified, a Copy On Write mechanism was introduced >>>> (I >>>> guess... Sorry, I was not there yet). >>>> >>>> Upon his CI effort, Jean Helou with the help of Matthieu Baechler >>>> made >>>> he unpleasant finding that this was not thread safe, that was leading >>>> to >>>> build instabilities. The mailet processing happens in Camel, which is >>>> multi-threaded. Concurrency issues arised between modifications, and >>>> message disposal, when a same MimeMessage instance was shared. [1] >>>> >>>> A first effort was to try to achieve thread-safety, which leaded to a >>>> brittle double reetrant read-write locks in order to govern data >>>> access. >>>> However, another performance enhancement bypassed these lock >>>> mechanism >>>> (MimeMessageWrapper allows accessing the data as an InputStream >>>> instead >>>> of requiring to copy it). The effort seemed overwhelming, not to >>>> metion >>>> possible risks of dead-locks. [2] >>>> >>>> We then came up with an always copy implementation [3]. Simpler, >>>> safer... The underlying logic is to avoid trying being smarter than >>>> mutability, and leverage immutability to achieve thread safety, which >>>> is >>>> a classic functional programming idiom. >>>> >>>> JMH benchmarks were conducted. We highlighter little performance >>>> difference for small messages, in the percent realm for both memory >>>> allocation and compute time. Differences are however higher for >>>> bigger >>>> messages (~10%) for both metrics. >>>> >>>> Please note that above 100KB the MimeMessage would be stored on disk, >>>> thus limiting memory impact (see MimeMessageInputStreamSource). Maybe >>>> we >>>> should make the threshold configurable, via a system property for >>>> instance? >>>> >>>> I just want to further mentioned I encountered that very issue on a >>>> production instance: the underlying email had been corrupted by the >>>> above mentioned COW bug and kept throwing NullPointerExceptions every >>>> time the content was accessed. This resulted (on top of >>>> distributed-james) in a RabbitMQ nack of the message, that ended up >>>> in a >>>> dead-letter queue. Replaying its processing required admin >>>> intervention >>>> and had been interpreted by the user as an email loss... >>>> >>>> To conclude this effort we (Jean an I) would like to merge the >>>> "Always >>>> copy" pull request. >>>> >>>> Also, would it be beneficial to write an ADR about this topic? >>>> >>>> Thoughts? >>>> >>>> Cheers, >>>> >>>> Benoit >>>> >>>> [1] The unfamous COW bug: >>>> >> https://github.com/apache/james-project/pull/280/commits/09b5554bbcbbb98757910d59bac54f97ee1f8b4f >>>> >>>> [2] The double nested reetrant read-write lock fix attempt: >>>> https://github.com/apache/james-project/pull/280 >>>> [3] The always copy fix: >>>> https://github.com/apache/james-project/pull/282 >>>> [4] Benchmarks: >>>> https://github.com/apache/james-project/pull/280#issuecomment-745211736 >>>> & >>>> https://github.com/apache/james-project/pull/280#issuecomment-745701937 >>>> [5] The JIRA ticket: https://issues.apache.org/jira/browse/JAMES-3477 >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
