On Tue, 2020-12-08 at 10:12 +0700, Tellier Benoit wrote:
> Hello Matthieu,
> 
> Sadly, I'm unable to see what you did write in the email you sent due
> to
> the absence of quote.
> 
> Can you review your email client settings, in order to get a readable
> output we can start discussing on?
> 
> This time, I made the effort, but I would greatly appreciate a better
> display.
> 

I don't know what happened, I use the same mailer for years and never
had this issue before.

This morning, replying to the original mail with the same mailer with
the same settings quote things.

I guess it's a bug.


> Best regards,
> 
> Benoit
> 
> Le 07/12/2020 à 14:47, Matthieu Baechler a écrit :
> > Hi Benoit,
> > 
> > On Fri, 2020-12-04 at 14:22 +0700, btell...@linagora.com (OpenPaaS)
> > wrote:
> > Hi,
> > 
> > I'm currently trying to increase overall efficiency of the
> > Distributed
> > James server.
> > 
> > As such, I'm pocking around for improvement areas and found a huge
> > topic
> > around LWT.
> > 
> > My conclusions so far are that we should keep LWT and SERIAL
> > consistency
> > level out of the most common use cases.
> > 
> > I know that this is a massive change in regard of the way the
> > project
> > had been working with Cassandra in the past few years. I would
> > definitely, in the middle term, would like to reach LWT free reads
> > on
> > the Cassandra Mailbox to scale the deployments I am responsible of
> > as
> > part of my Linagora job (my long term goal being to decrease the
> > total
> > cost of ownership of a "Distributed James" based solution). While I
> > am
> > not opposed to diverge from the Apache James project on this point,
> > if
> > needed, I do believe an efficient distributed server (with the
> > consequences it implies in term of eventual consistency) might be a
> > strong asset for the Apache project as well, and would prefer to
> > see
> > this work lending on the James project.
> > 
> > I've been ambitious on the ADR writing, especially in the
> > complementary
> > work section. Let's see which consensual ground we find on that!
> > (the
> > ML
> > version here below serving as a public, immutable reference of my
> > thinking!)
> > 
> > 
> > I doubt we can model IMAP without serializability somewhere but
> > let's
> > read your proposal as I have LWT as much as you are.
> 
> s/have/hate/ ?

Yes, typo

> 
> > 
> > 
> > -------------------------------------------------------------------
> > 
> > ## Context
> > 
> > As any kind of server James needs to provide some level of
> > consistencies.
> > 
> > Strong consistency can be achieved with Cassandra by relying on
> > LightWeight transactions. This enables
> > optimistic transactions on a single partition key.
> > 
> > Under the hood, Cassandra relies on the PAXOS algorithm to achieve
> > consensus across replica allowing us
> > to achieve linearizable consistency at the entry level. To do so,
> > Cassandra tracks consensus in a system.paxos
> > table. This `system.paxos` table needs to be checked upon reads as
> > well
> > in order to ensure the latest state of the ongoing
> > consensus is known. This can be achieved by using the SERIAL
> > consistency
> > level.
> > 
> > Experiments on a distributed James cluster (4 James nodes, having 4
> > CPU
> > and 8 GB of RAM each, and a 3 node Cassandra
> > cluster of 32 GB of RAM, 8 CPUs, and SSD disks) demonstrated that
> > the
> > system.paxos table was by far the most read
> > and compacted table (ratio 5).
> > The table triggering the most reads to the `system.paxos` table was
> > the
> > `acl` table. Deactivating LWT on this table alone
> > (lightweight transactions & SERIAL consistency level) enabled an
> > instant
> > 80% throughput, latencies reductions
> > as well as softer degradations when load breaking point is
> > exceeded.
> > 
> > 
> > Do you mean that Cassandra is the bottleneck in this setup?
> > What is the effect of having more Cassandra nodes?
> 
> Yes, it is.
> 
> The effect of adding more Cassandra nodes means more costs.

You didn't answered the question I asked, do you?

> Our ownership cost is so far of 5€/user/year which is around 25 time
> more than our competitors. The goal is to lower such costs, in order
> to
> have a viable commercial solution built on top of James.

Do you have any source regarding competitor costs?

BTW, I don't disagree we could have a better usage of resources.


> 
> > 
> > ## Decision
> > 
> > Rely on `event sourcing` to maintain a projection of ACLs that do
> > not
> > rely on LWT or SERIAL consistency level.
> > 
> > Event sourcing is thus responsible of handling concurrency and race
> > conditions as well as governing denormalization
> > for ACLs. It can be used as a source of truth to re-build ACL
> > projections.
> > 
> > Note that the ACL projection tables can end up being out of
> > synchronization from the aggregate but we still have a
> > non-questionable source of truth handled via event sourcing.
> > 
> > ## Consequences
> > 
> > We expect a better load handling, better response time, and cheaper
> > operation costs for Distributed James while not
> > compromising the data safety of ACL operations.
> > 
> > ACL updates being a rare operation, we do not expect significant
> > degradation of write performance by relying on
> > `eventSourcing`.
> > 
> > We need to implement a corrective task to fix the ACL
> > denormalization
> > projections. Applicative read repairs could be
> > implemented as well, offering both diagnostic and on-the-fly
> > corrections
> > without admin actions (a low probability should
> > however be used as loading an event sourcing aggregate is not a
> > cheap
> > thing).
> > 
> > 
> > What implementation are you using for Event Sourcing? AFAIK, James
> > on
> > Cassandra uses LWT + batchs for Event Store.
> 
> I have answered to Raphael that we were moving transactionality out
> of
> the read path. Writes being rare keeping some sort of transactions
> like
> eventsourcing on the write path is not an issue.

I don't see this question/answer in this mail thread.

> > 
> > ## Complementary work
> > 
> > There are several other places where we rely on Lightweight
> > transaction
> > in the Cassandra code base and
> > that we might want to challenge:
> > 
> >  - `users` we rely on LWT for throwing "AlreadyExist" exceptions.
> > LWT
> > are likely unnecessary as the webadmin
> > presentation layer is offering an idempotent API (and silents the
> > AlreadyExist exceptions). Only the CLI
> > (soon to be deprecated for Guice products) makes this distinction.
> > Discussions have started on the topic and a proof of
> > concept is available.
> >  - `domains` we rely on LWT for throwing "AlreadyExist" exceptions.
> > LWT
> > are likely unnecessary as the webadmin
> > presentation layer is offering an idempotent API (and silents the
> > AlreadyExist exceptions). Only the CLI
> > (soon to be deprecated for Guice products) makes this distinction.
> > Discussions have started on the topic and a proof of
> > concept is available.
> >  - `mailboxes` relies on LWT to enforce name unicity. We hit the
> > same
> > pitfalls than for ACLs as this is a very often
> >  read table (however mailboxes of a given user being grouped
> > together,
> > primary key read are more limited hence this is
> >  less critical). Similar results could be expected. Discussions on
> > this
> > topic have not been started yet. Further
> >  impact studies on performance needs to be conducted.
> > 
> > Well, lagging on ACL is not really a problem but for mailbox, don't
> > you
> > fear having race conditions and thus name collision on mailbox?
> 
> The eventSourcing source of truth being queried upon writes,
> conflicts
> will be resolved?

You propose to use Event Sourcing to handle mailbox operations?

> 
> > 
> >  - `messages` as flags update is so far transactional. However, by
> > better relying on the table structure used to store
> > flags we could be relying on Cassandra to solve data race issues
> > for
> > us.
> > Note also that IMAP CONDSTORE extension is not
> > implemented, and might be a non-viable option performance-wise. We
> > might
> > choose to favor performance other
> > transactionality on this topic. Discussions on this topic have not
> > started yet.
> > 
> > I think that modern IMAP extensions are important for the user
> > experience: they can make email handling faster by themselves. I
> > would
> > not make a choice that prevents implementation of such extensions
> > in
> > the futures.
> 
> My opinion is that IMAP belongs to the past. It is an inefficient,
> complicated protocol and our implementation of it is clearly not in
> good
> shape.
> 
> My strategy (at least on Linagora products) to convert as much
> clients
> as possible in JMAP.
> 
> I understand this point is controversial.

Standard protocols are always a thing of the past and it's why there
are actual widespread implementations for it.

How do you expect to make iOS support JMAP natively? macOS? Outlook?

People won't change their mailers just because IMAP is inefficient.
GMail is implementing IMAP and it's no coincidence.

I would not agree to let James deprecate IMAP usage for the time being.

> 
> > 
> > LWT are required for `eventSourcing`. As event sourcing usage is
> > limited
> > to low-usage use cases, the performance
> > degradations are not an issue.
> > 
> > I think I understand but I ask anyway: the performance gain is not
> > really the removal of LWT but the CQRS nature of Event Sourcing,
> > you'll
> > read in a view that doesn't use LWT. 
> 
> Yes.
> 
> > Can't you achieve the same with a
> > "simpler" CQRS architecture without using Event Sourcing?
> 
> Define ""simpler" CQRS architecture", I don't understand what you
> mean.

CQRS is having a write path and one or several read path. It allows to
have different constraints for each path. Like having a transactional
ACL writing and an eventually consistent fast non-transactional ACL
reading.

Event Sourcing is fine in this regard but its purpose is to take
decision, model around user intention, etc. It may not be exactly what
you want for ACLs.

You could, for example, keep the existing code and build a view with a
listener for the read path.

> 
> Also, as explained to Jean, ACLDiff being fired and use in the
> mailbox
> system, some sort of transactionality is enforced by the  current
> code,
> that's expensive to change, and I don't intend to change it.

Good point.

> > 
> > 
> > LWT usage is required to generate `UIDs`. As append message
> > operations
> > tend to be limited compared to
> > message update operations, this is likely less critical. UID
> > generation
> > could be handled via alternative systems,
> > past implementations have been conducted on ZooKeeper.
> > 
> > If not implementing IMAP CONDSTORE, generation of IMAP `MODSEQ`
> > likely
> > no longer makes sense. As such the fate of
> > `MODSEQ` is linked to decisions on the `message` topic.
> > 
> > 
> > Oh, here we are: we need yet another system. Note that I'm in favor
> > of
> > it but that's the reason why we use LWT in the first place: avoid
> > this
> > additional dependency. It's rather LWT or any transactional system
> > as
> > we can't find a wait to workaround the need for monotonic
> > distributed
> > counter (for example).
> > 
> > You listed several problems and in my opinion each one may have a
> > different solution. What about debating each one separately?
> 
> Please.
> 
> > 
> > Could we start from here: what's the best solution to implement a
> > monotonic distributed counter?
> 
> Here are ideas on the top of my head:
> 
>  - 1. No implementing it in the first place - because we don't need
> to.

Always the best. Didn't find how for now.

>  - 2. Zookeeper ?
>  - 3. ?

Having James instances share some data by themselves instead of relying
on an external system (for example using https://atomix.io/ )

Cheers,

-- Matthieu Baechler
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to