though Cassandra supports multiDC cross availability zone well

this dont' mean all Cassandra implems do

And James don't:
 - IMAP reliand on incrematal monotic counters means strong concistency which 
don't play well with high latencies (2-4 rountrips)
 - multiple levels of metadata makes it inconsistencies prone if not operated 
with quorum consistency - and quorum consistency means cross availability read 
and writes which is a latency and throughtput show stoper.

TL DR: James distributed server can work on multiDC, but with significant 
shortcomings, and only with region-wide set up, not world wide setup

-- 

Best regards,

Benoit TELLIER

General manager of Linagora VIETNAM.
Product owner for Team-Mail product.
Chairman of the Apache James project.

Mail: btell...@linagora.com
Tel: (0033) 6 77 26 04 58 (WhatsApp, Signal)


On Mar 8, 2025 10:48 AM, from Jean Helou <jhe...@apache.org>Hi Matt,

This has turned into a rather long answer. The first part is more about
james in general, the second is more about your specific setup :)


As far as I'm aware James itself is stateless. I don't think you loose
counter values when you restart your main server.

This, you should be able to spin as many James instances as you want and
point them to the same storage without issues. Even if there are some
asynchronous state updates the state should eventually converge.

The difficulty is distributed storage not distributed processing.

For instance of you spin a mariadb on one or your new VPs and reload a
backup from you main mariadb the states of both databases will immediately
start to diverge as they are unaware of each other, new messages delivered
to your main since the backup will not be visible to the VPs, messages read
on the vps will still appear unread on the main server.

>From there you will want to look into replication but simple
primary/secondary replication will throw errors on writes to the secondary
making you secondary James instance fill error logs on failed writes.

The next step is multimaster replication which is something I never tried.

The distributed james app demonstrates a fully distributed system :
including a distributed database (Cassandra), a distributed message broker
(rabbitmq iirc), a distributed search engine (opensearch), etc.

This allows you to have as many James nodes as you want, all talking to as
many messaging/storage nodes as you want. All fully synced and with write
semantics that offer a reasonable consistency. This is a setup that makes
sense for massive deployments. If you wanted to build the next google mail
for example.

The use of blob storage (S3 like)  to store message contents is an
orthogonal concern. Database storage is fairly expensive compared to blob
storage.  And storing large blobs in databases while doable is usually not
recommended, at least not without specific table design. The same is true
for message brokers.

The alternatives are storing on the file system, which is not distributed
or using a blob store.

I'm almost certain you can configure the distributed app (or build a
variant of it) that does not use blob storage but I wouldn't recommend it.

Now, how all this applies to your setup :)

My understanding is that for now you have a single rather powerful machine
hosting both James and mariadb. The james instance handles both SMTP and
IMAP or POP.

I'll also assume that you don't intend to start operating a multi DC
Cassandra cluster :)

Finally I'll assume the VPS are rather small at this price :)

If they are large enough to host a clone of your main Mariadb and it's data
you can use one for a mariadb and another for James.start from a backup of
the main Mariadb then use IMAP sync to have eventual consistency between
mailboxes on your main server and the replica.

You can go further and spread the workload of the main server too

You start a James instance configured for IMAP/POP on a couple vps
instances, keep the db config to talk to the main Mariadb.  Change your
clients config and eventually you can drop the corresponding listeners on
the main server if you want

Do the same for SMTP and put the new ones at a higher priority than the
instance running on the main server, after a while you can even stop the
main server James process entirely :)

The downside of course is increased latency both from client to vps but
also from vps to vps or to the main database server.

I hope that opens venues for exploration:)


Have fun

Le sam. 8 mars 2025 à 03:06, cryptearth <cryptea...@cryptearth.de.invalid>
a écrit :

> Hello there dear James devs and fellow James users,
>
> my hoster OVH currently offers me a great deal on VPSs for less than 12
> bucks a year (less than 1 buck per month) in several datacenters around
> the world. I really tempt to get that deal as I have some ideas to
> utilize multiple servers - having them around the world like in
> Australia and Canada is just a bonus.
> One thing I plan to implement is to setup James on each of the servers.
> But then the question came up: How to synchronize them?
> Currently I use my home server only as a backup without any
> synchronization with my main root server. In fact: It's currently not
> running due to some issues I have with my home server I have to fix
> first before get James running again.
> Now when scaling up to several servers around the world it would be cool
> to take advantage of that by combine them with synchronization. But as
> the additional systems are VPSs only I'd like to setup a master-slave
> setup with each slave James on the VPSs sync up to the master James on
> my powerful root server.
> First I thought about fetchmail to at least pull in mails from the
> slaves to the master - but fetchmail is only part of the deprecated
> spring build. As I like to have my mailstorage in a database I would
> like to keep using the guice-jpa build instead of switching the the
> guice-distributed which doesn't use jpa and seems to be meant for use
> with AWS S3 buckets.
> I also could write some java code using the java mail api working in a
> fetchmail way itself - but I'm unsure how to inject mails from other
> servers properly into the main server so they do look like if they were
> receive by the masterserver itself.
> Could it be done by just synchronizing the MariaDB databases in the
> background or would fiddle with the database while James is running
> screw it up like the several counters for mails and mailboxes?
> If James 3.x isn't suited for such a use case maybe that's something to
> be considered for 4.0? Or is that too late into the current development
> now and would delay a 4.0 release?
>
> I would like to explore this idea further to see if and how James can be
> used in a distributed cluster like other mailers can. Building a James
> mail server cluster sounds just cool - and seen from "well, big
> companies like google have several hundrets to thousands mail servers
> deployed around the glob all working together" it sure has to be
> possible with James as well - as broken down it's just some listeners on
> some server sockets with some database backend synchronized by a message
> bus. This should be extendable across multiple servers.
>
> Have a nice weekend everyone.
>
> Greetings from Germany,
>
> Matt
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
> For additional commands, e-mail: server-user-h...@james.apache.org
>
>

Reply via email to