Re: synchronize multiple James instances across servers

cryptearth Sun, 09 Mar 2025 12:16:10 -0700

Hey Benoit,
hey Jean,

thank you both for your replies.


Let me add some missing details:

1) How do I plan to use James distributed:

I only want to have SMTP run on the VPSs for mail receiving andforwarding/synchronizing up to the main instance on my dedicated rootserver. There's no plan to use them for sending mails or access themailboxes on them. Hence there's no need for downstream sync from themaster back to the slaves. All the slaves are supposed to do is to be alower priority alternatives meant to receive mails when the maininstance is down and deliever anything cached to it when it comes back.

So both imapsync and two-way database sync is way over my goals.

2) How can OVH infrastructure can help?

According to the page the VPSs are quite limited, maybe not even fullKVMs but rather just para-virtualized or even just containers. The OVHpage says:

Host CPU: Intel or AMD (they don't even specify any model or series)
Number of CPU cores: 1 vCPU core
RAM: 2GB
Storage: 20GB SSD
OS: (select from Ubuntu, Debian, Fedora, Rocky and some other)

So from that I doubt I'm even able to install Arch or OpenSuSE but stuckto whatever image OVH provides. I guess I'll go for a clean Debian12,maybe for Ubuntu 24.10 - but will see if I can install my own OS.Also I guess the VMs are on an over-provisionized beefy host system withan entire subnet assigned via virtual MACs and a SAN as storage backend.In addition to this OVH provides what they call a "vRack": My rootserver has two physical NICs: one assigned public, the other assignedprivate. From what I understand it's a virtualized VPN/VLAN provided bythe fancy network switching gear they have in thier datacenters. TheVPSs are also supposed to be compatible with this so instead of have touse SSH or openVPN to span my own VPN between the servers I can just usewhat OVH already provides.


3) Multiple James instance connect to a single MariaDB server?

So, where/how are the IDs used to identify the mails, mailboxes, users,etc. generated? I didn't found classic UNIQUE PRIMARY_KEY AUTO_INCREMENTlike in beginner books often shown.Is it even possible to connect several James instances to one databaseat once? Or will this cause synchronization issues? Can this be solvedby each James running its own mariadb server and just the databases aresynched via database replication?

Overall this is no hard requirement. If, like currently, there'scurrently no built-in automation I could either to continue have theslaves just run with a lower priority and check them regular if they gotmails - or just write a few lines to implement fetchmail myself: Use POPto fetch them from the slaves and drop them into the postmaster oradditional collector mailbox.

I guess I'll spin up a few VMs - both to familiarize myself with debian(it's better to properly learn instead of just copy'n'paste) and to seewhat happens when I connect multiple instance to one database or how toimplement a fetchmail.

As for the mentioned "slow performance": That's no issue as all theslaves are supposed to do are to drop any received mails back at themaster anyway which I use to send and retrieve via IMAP. If it takessome time for the VPS in Sydney to drop a mail back in Frankfurt - so beit. I guess the additional roundtrips and few 100ms latency doesn'tmatter in my use case.


Anyway - thanks again for your input. I'll see how I proceed from here.

Have a good one.

Matt

Am 08.03.25 um 16:12 schrieb Benoit TELLIER:

though Cassandra supports multiDC cross availability zone well

this dont' mean all Cassandra implems do

And James don't:
  - IMAP reliand on incrematal monotic counters means strong concistency which 
don't play well with high latencies (2-4 rountrips)
  - multiple levels of metadata makes it inconsistencies prone if not operated 
with quorum consistency - and quorum consistency means cross availability read 
and writes which is a latency and throughtput show stoper.

TL DR: James distributed server can work on multiDC, but with significant 
shortcomings, and only with region-wide set up, not world wide setup

--

Best regards,

Benoit TELLIER

General manager of Linagora VIETNAM.
Product owner for Team-Mail product.
Chairman of the Apache James project.

Mail: btell...@linagora.com
Tel: (0033) 6 77 26 04 58 (WhatsApp, Signal)


On Mar 8, 2025 10:48 AM, from Jean Helou <jhe...@apache.org>Hi Matt,

This has turned into a rather long answer. The first part is more about
james in general, the second is more about your specific setup :)


As far as I'm aware James itself is stateless. I don't think you loose
counter values when you restart your main server.

This, you should be able to spin as many James instances as you want and
point them to the same storage without issues. Even if there are some
asynchronous state updates the state should eventually converge.

The difficulty is distributed storage not distributed processing.

For instance of you spin a mariadb on one or your new VPs and reload a
backup from you main mariadb the states of both databases will immediately
start to diverge as they are unaware of each other, new messages delivered
to your main since the backup will not be visible to the VPs, messages read
on the vps will still appear unread on the main server.

 From there you will want to look into replication but simple
primary/secondary replication will throw errors on writes to the secondary
making you secondary James instance fill error logs on failed writes.

The next step is multimaster replication which is something I never tried.

The distributed james app demonstrates a fully distributed system :
including a distributed database (Cassandra), a distributed message broker
(rabbitmq iirc), a distributed search engine (opensearch), etc.

This allows you to have as many James nodes as you want, all talking to as
many messaging/storage nodes as you want. All fully synced and with write
semantics that offer a reasonable consistency. This is a setup that makes
sense for massive deployments. If you wanted to build the next google mail
for example.

The use of blob storage (S3 like)  to store message contents is an
orthogonal concern. Database storage is fairly expensive compared to blob
storage.  And storing large blobs in databases while doable is usually not
recommended, at least not without specific table design. The same is true
for message brokers.

The alternatives are storing on the file system, which is not distributed
or using a blob store.

I'm almost certain you can configure the distributed app (or build a
variant of it) that does not use blob storage but I wouldn't recommend it.

Now, how all this applies to your setup :)

My understanding is that for now you have a single rather powerful machine
hosting both James and mariadb. The james instance handles both SMTP and
IMAP or POP.

I'll also assume that you don't intend to start operating a multi DC
Cassandra cluster :)

Finally I'll assume the VPS are rather small at this price :)

If they are large enough to host a clone of your main Mariadb and it's data
you can use one for a mariadb and another for James.start from a backup of
the main Mariadb then use IMAP sync to have eventual consistency between
mailboxes on your main server and the replica.

You can go further and spread the workload of the main server too

You start a James instance configured for IMAP/POP on a couple vps
instances, keep the db config to talk to the main Mariadb.  Change your
clients config and eventually you can drop the corresponding listeners on
the main server if you want

Do the same for SMTP and put the new ones at a higher priority than the
instance running on the main server, after a while you can even stop the
main server James process entirely :)

The downside of course is increased latency both from client to vps but
also from vps to vps or to the main database server.

I hope that opens venues for exploration:)


Have fun

Le sam. 8 mars 2025 à 03:06, cryptearth <cryptea...@cryptearth.de.invalid>
a écrit :

Hello there dear James devs and fellow James users,

my hoster OVH currently offers me a great deal on VPSs for less than 12
bucks a year (less than 1 buck per month) in several datacenters around
the world. I really tempt to get that deal as I have some ideas to
utilize multiple servers - having them around the world like in
Australia and Canada is just a bonus.
One thing I plan to implement is to setup James on each of the servers.
But then the question came up: How to synchronize them?
Currently I use my home server only as a backup without any
synchronization with my main root server. In fact: It's currently not
running due to some issues I have with my home server I have to fix
first before get James running again.
Now when scaling up to several servers around the world it would be cool
to take advantage of that by combine them with synchronization. But as
the additional systems are VPSs only I'd like to setup a master-slave
setup with each slave James on the VPSs sync up to the master James on
my powerful root server.
First I thought about fetchmail to at least pull in mails from the
slaves to the master - but fetchmail is only part of the deprecated
spring build. As I like to have my mailstorage in a database I would
like to keep using the guice-jpa build instead of switching the the
guice-distributed which doesn't use jpa and seems to be meant for use
with AWS S3 buckets.
I also could write some java code using the java mail api working in a
fetchmail way itself - but I'm unsure how to inject mails from other
servers properly into the main server so they do look like if they were
receive by the masterserver itself.
Could it be done by just synchronizing the MariaDB databases in the
background or would fiddle with the database while James is running
screw it up like the several counters for mails and mailboxes?
If James 3.x isn't suited for such a use case maybe that's something to
be considered for 4.0? Or is that too late into the current development
now and would delay a 4.0 release?

I would like to explore this idea further to see if and how James can be
used in a distributed cluster like other mailers can. Building a James
mail server cluster sounds just cool - and seen from "well, big
companies like google have several hundrets to thousands mail servers
deployed around the glob all working together" it sure has to be
possible with James as well - as broken down it's just some listeners on
some server sockets with some database backend synchronized by a message
bus. This should be extendable across multiple servers.

Have a nice weekend everyone.

Greetings from Germany,

Matt

---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
For additional commands, e-mail: server-user-h...@james.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
For additional commands, e-mail: server-user-h...@james.apache.org

Re: synchronize multiple James instances across servers

Reply via email to