+1 for having this in James. POP3 does have some niche uses due to its 
simplicity, e.g. integrating with mail-processing third party systems. 
And in such cases you really want bulk throughput, every bit of 
performance improvement helps.

Cheers, Karsten

On 03.08.21 06:11, [email protected] wrote:
> Hello all,
>
> For some of my customers, we did develop a multi-datacenter friendly
> POP3 server as a derivate of James distributed server.
>
> It fully avoids lightweight transactions (LWTs) and thus is efficient in
> a multi-datacenter setup.
>
> The regular James distributed server was a limiting factor: we
> encountered multiple errors linked to Lightweight transaction: read
> timeouts at consistency SERIAL.
>
> We thus proposed an alternative implementation of the POP3 mailbox,
> based on the messageId backed by a TimeUUID. TimeUUIDs have extremely
> low collision chances and their generation do not require any
> synchronisation. Also, given the POP3 support alone, monotic generation
> of UIDs and MODSEQs is not necessary, and collisions can be tolerated:
> we can thus rely on a random generation strategy. Along with options
> introduced in JAMES-3435 it allows not doing any (LWTs).
>
> We needed to introduce an additional Cassandra table: given a mailbox,
> list all the messages contained in it by their messageId - size is added
> to the projection for efficiency. This table is maintained via a mailbox
> listener. MessageId is then used for content retrieval and deletion.
>
> This POP3 implementation had been functionally tested  with thunderbird.
>
> We did furthermore conduct performance tests on top of two datacenters.
> See [1] below as a reference.
>
> Given that there is traction for such a server (in the medical field a
> lot of people still uses POP3),
> Given the minimal amount of code written,
> Given that we might have one of the first multi-DC friendly MDA of the
> market (POP3 only),
> I propose to create a new distributed-pop3-server leveraging the above
> design.
>
> I will write an ADR to further express the needs and the design, as well
> as open a Proof Of Concept pull request. It will be based on [2]
>
> Best regards
>
> Benoit TELLIER
>
> [2] POC developper @linagora:
> https://github.com/linagora/james-project/pull/4321
> ---------------------------
>
> [1] Performance test exercising the distributed multiDC POP3 server
>
> Infrastructure:
> Â - 2 DC of 3 Cassandra node each linked via VPN on a link with latencies
> of ~1ms. 2 cores 8 GB for each Cassandra.
> Â - 4 James nodes of 4 core and 16 GBs
>
> Testing:
> Â - Send 100 mail per second during 10 minutes to 80 users
> Â - Then STAT the mails in POP3
> Â - Then clean them up (DELE + QUIT)
>
> Before:
> Â - The mail processing speed was 73 mail/seconds
> Â - We noticed 476 SERIAL read timeouts in the logs
> Â - UID / MODSEQ generation are the top queries upon LocalDelivery (40 ms+ !)
>
> After:
> Â - The mail processing speed improved to 85 mail/seconds
> Â - We did not notice any SERIAL read timeouts in the logs
> Â - Other cassandra queries did benefit from not co-existing with LWTs
> queries (~ 10% faster)
> Â - This performance test was conducted without random generation
> strategy UID and MODSEQs. Further enhancements would be expected with
> the exact above proposed design.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:[email protected]
> For additional commands, e-mail:[email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to