Re: dict_mongodb

Hamid Maadani Tue, 21 Jun 2022 22:12:21 -0700

> Only if you provide code to handle list-valued result columns, and if
> such denormalised schemas are best-practice for MongoDB.
> 
> Well, this is an important design choice. What sort of schemas are
> best-practice in this space? Are joins need to enable some data
> "normalisation"?


MongoDB is a NoSQL, non-relational database. That does not necessarily mean you 
can not model
the data in a 'relational' schema, but it was designed to be a 'document 
database':
https://www.mongodb.com/compare/relational-vs-non-relational-databases

If you go back to our earlier conversations, I provided an example of my 
current mail database.
Where in a traditional relational database you would have two tables, one for 
mailbox and  one
for alias and join them using a key to get the goto email address for a certain 
alias, I have:
{
    "_id" : ObjectId(REDACTED),
    "username" : "ha...@dexo.tech",
    "password" : "REDACTED",
    "name" : "Hamid Maadani",
    "maildir" : "ha...@dexo.tech/",
    "quota" : REDACTED,
    "local_part" : "hamid",
    "domain" : "dexo.tech",
    "created" : ISODate("2016-11-07T21:07:21.000Z"),
    "modified" : ISODate("2017-05-02T22:10:00.000Z"),
    "active" : 1,
    "alias" : [ 
        {
            "address" : "ab...@dexo.tech",
            "created" : ISODate("2016-11-07T21:04:16.000Z"),
            "modified" : ISODate("2016-11-07T21:04:16.000Z"),
            "active" : 1
        }, 
        {
            "address" : "hostmas...@dexo.tech",
            "created" : ISODate("2016-11-07T21:04:16.000Z"),
            "modified" : ISODate("2016-11-07T21:04:16.000Z"),
            "active" : 1
        }
    ]
}

So each real mailbox, is represented as a JSON document, and it contains all 
the aliases nested
as an array under the "alias" key.
This would mean, to find the goto email for an alias, all you need is:
filter = { "alias.address": "%s", "active": 1 }
result_attribute = username

No need for joins. Now, if one chooses to have these separated and take a more 
traditional
approach, that can absolutely be done using two collections (tables) with 
aggregations. It
would just complicate things a bit, and it does so unnecessarily in my opinion.

> They're not essential, but can be added as expert features. Let's get
> the basics right first, and talk about the expert features second.

Absolutely fair. I usually try to account for multiple requirements from the 
beginning, but let's
approach this your way. I will work on the result_attribute + result_format and 
we go from there.

> Is there some prior art in this space? Has anyone used MongoDB for
> managing email users and lists with some other MTA?

Besides me, not that I know of.

> There should be no assumption that all tables use the same database.
> Each table designates its source database. The thing that need not
> be supported (and is likely impossible to express or difficult to
> implement) is joins or other operations that span multiple databases.

Understood. Is there any prior code in postfix I can repurpose for array 
management to keep an
static list of mongoc_client_t objects (one per named dict)? Or should I write 
it within the module?
trying to avoid creation and destruction of clients per lookup call, and keep a 
persistent connection.

Regards
Hamid Maadani

June 21, 2022 9:32 PM, "Viktor Dukhovni" <postfix-us...@dukhovni.org> wrote:

> On Wed, Jun 22, 2022 at 04:13:40AM +0000, Hamid Maadani wrote:
> 
>> This sort of "concat" operation is a bad idea, because it is prone to 
>> collisions...
>> 
>> Those were just examples to discuss a point. You can find similar
>> types of concatenations in multiple guides written for setting up
>> postfix with a mysql backend. For example refer to
>> 'virtual_alias_domains.cf' mentioned in arch linux's wiki page:
>> 
>> https://wiki.archlinux.org/title/Virtual_user_mail_system_with_Postfix,_Dovecot_and_Roundcube
> 
> There are lots of Wikis giving dubious advice. Yes, in some corner
> cases one might actually want to compute some result elements as
> concatenations of multiple input fields, and perhaps this can be
> supported, but it should not be encouraged, and the simple cases where
> this is not used should be easy and natural to express.
> 
>> I was just trying to understand if these type operations (concat,
>> etc.) need to be supported in the projection. Am I correct in
>> understanding they are not?
> 
> They're not essential, but can be added as expert features. Let's get
> the basics right first, and talk about the expert features second.
> 
>> If the result_attribute + result_format design is the best practice,
>> I'm all for that. need to go look at the result_format and understand
>> how to use it with mongo..
> 
> It is the "basics right" approach, which avoids advanced MongoDB
> syntax.
> 
>> which would return:
>> maadani,ha...@dexo.tech/,dukhovni,vik...@postfix.org/
>> 
>> which makes no sense.
>> 
>> This is honestly confusing to me. This was meant to show we are
>> printing multiple multi-valued results as one comma separated string.
> 
> These *particular* results make no sense because you're mixing last
> names with directory paths. The list elements are from different
> semantic domains.
> 
>> When you say this makes no sense, are you referring to this result not
>> being useful to postfix because of multiple mail-paths in it? or the
>> comma separated string part!?
> 
> Neither, it is the disparate semantics of the elements. Had the
> elements all come from the same semantic domain, and not been compounded
> from multiple input columns, they would typically all have the same
> post-processing requirements, that could likely be handled with just
> "result_format".
> 
>> You do have to decide how mailing lists are modeled in MongoDB. Are
>> they one row per member? Is it a list of "_id" values? Or a list of
>> email addresses? If the former, how does list expansion work? Can
>> MongoDB do joins as well as projections? ...
>> 
>> I imagine each list as a JSON object with an array of addresses inside of 
>> it. Something like:
>> { "createdAt": ISODate("<some date>"), "active": 1, "addresses": [ 
>> "ha...@dexo.tech",
>> "vik...@postfix.org" ] }
>> 
>> Would that work?
> 
> Only if you provide code to handle list-valued result columns, and if
> such denormalised schemas are best-practice for MongoDB. A more typical
> database practice is to have a "member" table, which makes it easy to
> insert users into lists without modifying the list itself, to delete
> a user from all the lists a user is a member of, ...
> 
> Member tables work best the database supports some form of "join"
> operation, though of course they could be as simple as:
> 
> { "list": "somel...@example.net", "member": "la...@example.net' }
> { "list": "somel...@example.net", "member": "cu...@example.net' }
> { "list": "somel...@example.net", "member": "m...@example.net' }
> 
> with both the list name and the member primary address stored by value,
> rather than by reference.
> 
> Is there some prior art in this space? Has anyone used MongoDB for
> managing email users and lists with some other MTA?
> 
>> MongoDB supports joins, but through "aggregation pipelines":
>> https://www.mongodb.com/docs/manual/aggregation
>> 
>> here, we are using 'mongoc_collection_find_with_opts' which runs a 'find' 
>> operation. If support for
>> joins are necessary, we should switch to 'mongoc_database_aggregate' and 
>> require 'filter' to be in
>> the
>> pipeline format:
>> http://mongoc.org/libmongoc/current/mongoc_database_aggregate.html
> 
> Well, this is an important design choice. What sort of schemas are
> best-practice in this space? Are joins need to enable some data
> "normalisation"? ...
> 
>> One more question, what's the policy regarding multiple databases? the
>> way that the module is now, it supports multiple collections (tables)
>> in only one database. Should I put any effort in supporting multiple?
>> For example, if mailboxes are in cluster 1 and mail lists in cluster 2
>> (separate URIs basically)?
> 
> There should be no assumption that all tables use the same database.
> Each table designates its source database. The thing that need not
> be supported (and is likely impossible to express or difficult to
> implement) is joins or other operations that span multiple databases.
> 
> --
> Viktor.

Re: dict_mongodb

Reply via email to