+1 for having this in James. POP3 does have some niche uses due to its simplicity, e.g. integrating with mail-processing third party systems. And in such cases you really want bulk throughput, every bit of performance improvement helps.
Cheers, Karsten On 03.08.21 06:11, [email protected] wrote: > Hello all, > > For some of my customers, we did develop a multi-datacenter friendly > POP3 server as a derivate of James distributed server. > > It fully avoids lightweight transactions (LWTs) and thus is efficient in > a multi-datacenter setup. > > The regular James distributed server was a limiting factor: we > encountered multiple errors linked to Lightweight transaction: read > timeouts at consistency SERIAL. > > We thus proposed an alternative implementation of the POP3 mailbox, > based on the messageId backed by a TimeUUID. TimeUUIDs have extremely > low collision chances and their generation do not require any > synchronisation. Also, given the POP3 support alone, monotic generation > of UIDs and MODSEQs is not necessary, and collisions can be tolerated: > we can thus rely on a random generation strategy. Along with options > introduced in JAMES-3435 it allows not doing any (LWTs). > > We needed to introduce an additional Cassandra table: given a mailbox, > list all the messages contained in it by their messageId - size is added > to the projection for efficiency. This table is maintained via a mailbox > listener. MessageId is then used for content retrieval and deletion. > > This POP3 implementation had been functionally tested with thunderbird. > > We did furthermore conduct performance tests on top of two datacenters. > See [1] below as a reference. > > Given that there is traction for such a server (in the medical field a > lot of people still uses POP3), > Given the minimal amount of code written, > Given that we might have one of the first multi-DC friendly MDA of the > market (POP3 only), > I propose to create a new distributed-pop3-server leveraging the above > design. > > I will write an ADR to further express the needs and the design, as well > as open a Proof Of Concept pull request. It will be based on [2] > > Best regards > > Benoit TELLIER > > [2] POC developper @linagora: > https://github.com/linagora/james-project/pull/4321 > --------------------------- > > [1] Performance test exercising the distributed multiDC POP3 server > > Infrastructure: >  - 2 DC of 3 Cassandra node each linked via VPN on a link with latencies > of ~1ms. 2 cores 8 GB for each Cassandra. >  - 4 James nodes of 4 core and 16 GBs > > Testing: >  - Send 100 mail per second during 10 minutes to 80 users >  - Then STAT the mails in POP3 >  - Then clean them up (DELE + QUIT) > > Before: >  - The mail processing speed was 73 mail/seconds >  - We noticed 476 SERIAL read timeouts in the logs >  - UID / MODSEQ generation are the top queries upon LocalDelivery (40 ms+ !) > > After: >  - The mail processing speed improved to 85 mail/seconds >  - We did not notice any SERIAL read timeouts in the logs >  - Other cassandra queries did benefit from not co-existing with LWTs > queries (~ 10% faster) >  - This performance test was conducted without random generation > strategy UID and MODSEQs. Further enhancements would be expected with > the exact above proposed design. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail:[email protected] > For additional commands, e-mail:[email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
