Hi Janos
I think we are more or less on the same page:
> my point is that piler somehow has to know whether the current email is
> for client1 or client2, ... The easiest way to do this is to use a
> custom email address on piler's side, eg.
> for client1 or client2, ... The easiest way to do this is to use a
> custom email address on piler's side, eg.
> clie...@archive1.synaq.com
> clie...@archive1.synaq.com
> clie...@archive1.synaq.com
> clie...@archive1.synaq.com
> clie...@archive1.synaq.com
> clie...@archive2.synaq.com
> clie...@archive2.synaq.com
> clie...@archive2.synaq.com
> clie...@archive2.synaq.com
> clie...@archive2.synaq.com
That is kinda what I had in mind, yes - as long as the smtp cluster knows how to route to a specific piler server for archiving. Your suggestion above will do the trick
> And moreover the per customer database name can be "client1", "client2", ...
> so there's no need for a customer - database mapping.
> so there's no need for a customer - database mapping.
Right again, that would work too - as long as piler ui knows to provision and search individual databases for each client instance the server archives for. The new database could be provisioned in the ui when a new customer is created in the SaaS customer branding ui. Maybe the new client databases could be derived from the SaaS client name, so piler knows where to provision the client DB and where to search and store client specific data.
Example:
/var/lib/mysql/mailpiler/client1
/var/lib/mysql/mailpiler/client2
etc
> I think there can be a global database where piler stores policies, etc.
> The per customer databases contain the following tables: metadata+rcpt,
> attachment, domain, sph_*, tag, note, tag, option, search, group*, and
> audit.
> attachment, domain, sph_*, tag, note, tag, option, search, group*, and
> audit.
> Not sure if there should be any local account for clients, and I
> wouldn't put any administrator account in the per user related
> wouldn't put any administrator account in the per user related
> databases. It also means that customers can't introduce any
> archiving or retention policies, I believe that they should be
> maintained by you (ie. the provider).
Whatever get's configured in the ui (customer, ldap, etc) should be capable of replication - maybe this is the global policy database you meant
Whatever entries are added automatically by the Piler server on archiving should be in local database - no replication.
I agree on account data. If a client can't provide their own ldap, then we would need to carry our own ldap server for shared auth The DB is not meant for authentication, rather it is for defining authentication (ldap servers, domains, groups, etc) and policy configuration, and that information, as well as defined policies should be capable of replication. This way, when clients are provisioned, this configuration data is also provisioned on cluster peers in the DR location. But message data is not shared, since each server generates that data themselves when they receive and archive email. The question is though, how to provision the DB on the cluster peer? Maybe a call to cluster peer(s) via pilergetd?
> 5) Piler servers provide per client storage location
> (/var/piler/store/00/{client1,client2,client3}/yearmonth?/...) - this
> makes storage deprov easier since client email is in a given location
> rather than shared.
> got it. I had /var/piler/store/{client1,client2,client3}/00/ in mind.
> The yearmonth thing is more or less implemented, it's called "12 days".
perfect, although the "12 days" sounds misleading - did you mean "12 months"?
> I plan to use the current sphinx scheme (main, delta, delta-delta) per customer,
> so there can be /var/piler/sphinx/{client1,client2,...} directories. Probably there
> should be a single sphinx.conf file having listing all customers (on the given piler node/cluster).
Great! But this configuration needs to be capable of being automated by the provisioning process. Perhaps an API could be called to handle it?
> 7) Dedicated servers for clients determined by server url - possibly
> handled by Nginx backend proxy directives:
> http://archive.example.org/client1_to_50/ = cluster1
> http://archive.example.org/client51_to_100/ = cluster2
> How about http://archive.{client1,client2,....}.org/ ? The point could
> http://archive.example.org/client1_to_50/ = cluster1
> http://archive.example.org/client51_to_100/ = cluster2
> be to use a single piler gui installation for all clients on the given
> piler node.
> piler node.
> Or if you want the archive url to be http://archive.synaq.com/url,
> the you can still rewrite the url to
> http://archive.{client1,client2,....}.org/
> by nginx.
> the you can still rewrite the url to
> http://archive.{client1,client2,....}.org/
> by nginx.
> /index.php is hardwired at a few places, and probably even in some css
> stuff.
I think I could have been clearer here:
On Nginx this would be http://archive.synaq.com/client1/
on the backend this would be proxied to http://piler_cluster1/archive/ or
The url location on Piler can always be the same, since piler doesn't really care.
As long as it's served from a url. When proxying a url to a server root, the piler
server will get passed the url as well, which results in a 404. Instead of being
hardwired, can the server not rather honor the $config['SITE_URL'] directive?
So if I set $config['SITE_URL'] = 'http://arc-cluster1.synaq.com/archive/', this url is
applied everywhere? This should then not require any special rewrites in Nginx
since http://archive.synaq.com/blah/ will always be cleanly mapped to
http://{cluster-server}.synaq.com/archive/
> dropping a table or db on client deprov is definitely quicker and
> cleaner from a performance point of view than trying to run a delete
> across a whole shared database.
> indeed. Will you run an sql server on each piler node, or do you prefer
> across a whole shared database.
> to have a dedicated mysql cluster to hold all clients' piler related
> data?
> data?
I think if the configuration data can be split out from the message metadata,
the config or policy SQL data could be stored anywhere - local or central, as long
as it can be replicated with a peer. However, because in this model the archive
cluster servers archive independently of each other, message metadata should
be on each of the piler nodes. I specifically want the cluster servers to do
independent journalling and storage processing because I want to minimise the
risk of corruption being replicated. Storage data is changing all of the time,
whereas config and policy data is fairly static and could be safely backed up with
mysqldump or other backup tools.
Best regards,
Janos
Janos