[
https://issues.apache.org/jira/browse/JAMES-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17483524#comment-17483524
]
Benoit Tellier commented on JAMES-3710:
---------------------------------------
Hello, sadly there is no silver bullet for this as the underlying data store do
not support transactions.
For completness, you should include the pop3metadata table in the analyse.
Here is how the deletes proceeds:
(Delete pointers pointing to the data)
1. Delete data in the table of truth : ImapUidTable
2. Delete data in the de denormalisation table : MessageIdTable. This is
retried with exponential backoffs in an attempt to match content of the table
of truth in a best attempt effort
3. A couple of mailbox related metadata projections are updated too. Retries if
idenpotent.
4. From there the deletion is broadcasted to the mailbox listeners, in
RabbitMQ, events are retried accross the cluster
5. the POP3 listener removes the entry from pop3metadata
There are additional mechanisms (applicative read repairs, consistency tasks)
to further limit that.
The POP3 server detects such inconsistencies and fix them (emmitting a log in
the process as you pointed out).
So there is little more that can be done to limit denormalisation
inconsistencies.
Having a look at the POP3 server
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html#shutdown()
is being called during graceful shutdowns which is supposed to let underlying
tasks the time to complete as far as I am aware of. I'm not an expert here, and
I would be happy to have an other pair of eyes looking at this.
> Restarting James while deleting using POP3 causes inconsistency
> ---------------------------------------------------------------
>
> Key: JAMES-3710
> URL: https://issues.apache.org/jira/browse/JAMES-3710
> Project: James Server
> Issue Type: Bug
> Affects Versions: master
> Reporter: Ilja Weis
> Priority: Major
> Attachments: logs.zip
>
>
> Running James (distributed-pop3-app) and restarting it while
> it is processing a lot of POP3 sessions that are deleting messages
> causes the tables messagev3, pop3metadata and imapuidtable to be
> inconsistent, leading to the problems discussed in JAMES-3709.
> The effect after applying the changes from
> https://github.com/apache/james-project/pull/861 is that POP3 sessions TOPing
> messages that no longer exist
> but still are in pop3metadata now cause "-ERR Message (12) does not exist."
> as expected but no longer destroy the whole session.
> The interesting part is why the data becomes inconsistent. This is probably
> an edge case because we're probably going to restart James all the time
> but perhaps there's a problem somewhere that's just more likely to happen
> when restarting?
> My setup: 4 James instances in Kubernetes, S3 storage, Cassandra
> cluster, 2 datacenters with 3 nodes each, replication DC1:3,DC2:3. Relevant
> cassandra properties:
> cassandra.consistency_level.regular=QUORUM
> cassandra.consistency_level.lightweight_transaction=SERIAL
> message.read.strong.consistency=false
> message.write.strong.consistency.unsafe=false
> mailbox.read.strong.consistency=false
> mailbox.read.repair.chance=0.00
> mailbox.counters.read.repair.chance.max=0.000
> mailbox.counters.read.repair.chance.one.hundred=0.000
> What I did:
> - Send a number of mails to 200 mailboxes. After this:
> > select count(*) from messagev3;
> count
> -------
> 16837
> > select count(*) from imapuidtable;
> count
> -------
> 16837
> > select count(*) from pop3metadata ;
> count
> -------
> 16837
> Then, start deleting all those messages with 40 parallel sessions.
> There are no concurrent sessions to the same account.
> While the deletes are running, restart all the james instances.
> After a moment, we have:
> > select count(*) from messagev3;
> count
> -------
> 14669
> > select count(*) from imapuidtable;
> count
> -------
> 14194
> > select count(*) from pop3metadata ;
> count
> -------
> 14669
> Not sure if messagev3 is relevant here, just adding it for completeness.
> Now if I'm accessing the mailboxes, this will touch some of the
> messages that are no longer in imapuidtable which of course leads to
> -ERR Message (12) does not exist.
> and
> 16:32:48.175 [WARN ] o.a.j.p.m.DistributedMailboxAdapter - Removing
> cc9b9f70-7f8d-11ec-b659-0f643f8b11e7 from
> 8870f760-7f87-11ec-9109-5fc36fc8b18d POP3 projection for user
> [email protected] at it is not backed by a MailboxMessage
> Running the fixPop3Inconsistencies task in such a situation would then
> clean up all the pop3metadata messages as "stale".
> I have attached the James log while deleting/restarting and after
> restarting for all 4 instances.
> What do we make of this? Is this something relevant or more like something
> that just can happen if we restart James at the wrong time?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]