[ 
https://issues.apache.org/jira/browse/JAMES-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484622#comment-17484622
 ] 

Ilja Weis commented on JAMES-3710:
----------------------------------

We've looked at the source a bit to find out what actually happens.... with 
POP3 deletes the pop3 metadata is removed synchronously AFTER the other 
operations so this might indeed not happen if the James processes are shutdown 
at the wrong time. Changing this looks like quite some work and is probably not 
worth it considering how rarely this happens and the fact that we don't lose 
any data and apart from the "ERR no such message" which is repaired 
immediately, it causes no problems for clients.

So I'm closing this issue. Thanks!

> Restarting James while deleting using POP3 causes inconsistency
> ---------------------------------------------------------------
>
>                 Key: JAMES-3710
>                 URL: https://issues.apache.org/jira/browse/JAMES-3710
>             Project: James Server
>          Issue Type: Bug
>    Affects Versions: master
>            Reporter: Ilja Weis
>            Priority: Major
>         Attachments: logs.zip
>
>
> Running James (distributed-pop3-app) and restarting it while
> it is processing a lot of POP3 sessions that are deleting messages
> causes the tables messagev3, pop3metadata and imapuidtable to be
> inconsistent, leading to the problems discussed in JAMES-3709.
> The effect after applying the changes from 
> https://github.com/apache/james-project/pull/861 is that POP3 sessions TOPing 
> messages that no longer exist
> but still are in pop3metadata now cause "-ERR Message (12) does not exist."
> as expected but no longer destroy the whole session.
> The interesting part is why the data becomes inconsistent. This is probably
> an edge case because we're probably going to restart James all the time
> but perhaps there's a problem somewhere that's just more likely to happen
> when restarting?
> My setup: 4 James instances in Kubernetes, S3 storage, Cassandra
> cluster, 2 datacenters with 3 nodes each, replication DC1:3,DC2:3. Relevant
> cassandra properties:
> cassandra.consistency_level.regular=QUORUM
> cassandra.consistency_level.lightweight_transaction=SERIAL
> message.read.strong.consistency=false
> message.write.strong.consistency.unsafe=false
> mailbox.read.strong.consistency=false
> mailbox.read.repair.chance=0.00
> mailbox.counters.read.repair.chance.max=0.000
> mailbox.counters.read.repair.chance.one.hundred=0.000
> What I did:
> - Send a number of mails to 200 mailboxes. After this:
> > select count(*) from messagev3;
>  count
> -------
>  16837
> > select count(*) from imapuidtable;
>  count
> -------
>  16837
> > select count(*) from pop3metadata ;
>  count
> -------
>  16837
> Then, start deleting all those messages with 40 parallel sessions.
> There are no concurrent sessions to the same account.
> While the deletes are running, restart all the james instances.
> After a moment, we have:
> > select count(*) from messagev3;
>  count
> -------
>  14669
> > select count(*) from imapuidtable;
>  count
> -------
>  14194
> > select count(*) from pop3metadata ;
>  count
> -------
>  14669
> Not sure if messagev3 is relevant here, just adding it for completeness.
> Now if I'm accessing the mailboxes, this will touch some of the
> messages that are no longer in imapuidtable which of course leads to
> -ERR Message (12) does not exist.
> and
> 16:32:48.175 [WARN ] o.a.j.p.m.DistributedMailboxAdapter - Removing 
> cc9b9f70-7f8d-11ec-b659-0f643f8b11e7 from 
> 8870f760-7f87-11ec-9109-5fc36fc8b18d POP3 projection for user 
> [email protected] at it is not backed by a MailboxMessage
> Running the fixPop3Inconsistencies task in such a situation would then
> clean up all the pop3metadata messages as "stale".
> I have attached the James log while deleting/restarting and after
> restarting for all 4 instances.
> What do we make of this? Is this something relevant or more like something
> that just can happen if we restart James at the wrong time?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to