[pfx-dev] Re: pfx 3.8.4 build noise: "warning: missing braces around initializer [-Wmissing-braces]"

2023-12-26 Thread Viktor Dukhovni via Postfix-devel
On Tue, Dec 26, 2023 at 04:16:18PM -0500, Viktor Dukhovni via Postfix-devel 
wrote:

> If I enable "-Wall", I get the noisy warnings, and they can all be
> disabled by adding:
> 
> -Wno-missing-braces
> -Wmaybe-uninitialized
> -Wunused-but-set-variable
> -Wunused-function
> 
> to CCARGS.

Sorry, make that:

  -Wno-missing-braces
  -Wno-maybe-uninitialized
  -Wno-unused-but-set-variable
  -Wno-unused-function

I neglected to add the "no-" prefixes to the last three.

-- 
Viktor
___
Postfix-devel mailing list -- postfix-devel@postfix.org
To unsubscribe send an email to postfix-devel-le...@postfix.org


[pfx-dev] Re: pfx 3.8.4 build noise: "warning: missing braces around initializer [-Wmissing-braces]"

2023-12-26 Thread Viktor Dukhovni via Postfix-devel
On Tue, Dec 26, 2023 at 02:33:35PM -0500, pgnd via Postfix-devel wrote:

> lots of noisy warnings,
> 
>   grep missing-braces tmp.txt
>   dict.c:627:38: warning: missing braces around initializer 
> [-Wmissing-braces]

All these about initialising arrays of structures without braces around
each list of members:

>   tls_misc.c:322:43: warning: missing braces around initializer 
> [-Wmissing-braces]

For example:

static const NAME_CODE protocol_table[] = {
SSL_TXT_SSLV2, TLS_PROTOCOL_SSLv2,
SSL_TXT_SSLV3, TLS_PROTOCOL_SSLv3,
SSL_TXT_TLSV1, TLS_PROTOCOL_TLSv1,
SSL_TXT_TLSV1_1, TLS_PROTOCOL_TLSv1_1,
SSL_TXT_TLSV1_2, TLS_PROTOCOL_TLSv1_2,
TLS_PROTOCOL_TXT_TLSV1_3, TLS_PROTOCOL_TLSv1_3,
0, TLS_PROTOCOL_INVALID,
};

Which the compiler apparently (modulo whitespace) prefers to be:

static const NAME_CODE protocol_table[] = {
{ SSL_TXT_SSLV2, TLS_PROTOCOL_SSLv2 },
{ SSL_TXT_SSLV3, TLS_PROTOCOL_SSLv3 },
{ SSL_TXT_TLSV1, TLS_PROTOCOL_TLSv1 },
{ SSL_TXT_TLSV1_1, TLS_PROTOCOL_TLSv1_1 },
{ SSL_TXT_TLSV1_2, TLS_PROTOCOL_TLSv1_2 },
{ TLS_PROTOCOL_TXT_TLSV1_3, TLS_PROTOCOL_TLSv1_3},
{ 0, TLS_PROTOCOL_INVALID   },
};

These warnings don't show up either with clang or gcc on the systems I
use.  They can be safely ignored.  Perhaps some day we'll do something
about it, but for now, just turn them off by changing the compiler
warning flags.

If I enable "-Wall", I get the noisy warnings, and they can all be
disabled by adding:

-Wno-missing-braces
-Wmaybe-uninitialized
-Wunused-but-set-variable
-Wunused-function

to CCARGS.

[ The unused "cleanup_extract_internal" function could perhaps some day
  be dropped. Its call sites went away in postfix-2.2-20041019.  The
  remaining warnings are not productive. ]

-- 
Viktor.
___
Postfix-devel mailing list -- postfix-devel@postfix.org
To unsubscribe send an email to postfix-devel-le...@postfix.org


[pfx-dev] Re: dict_mongodb (projections)

2023-12-06 Thread Viktor Dukhovni via Postfix-devel
On Thu, Dec 07, 2023 at 01:06:57AM +, Hamid Maadani wrote:

> >> However, I am concerned about the use of `bson_new_from_json()` and its
> >> need to quote the MongoDB operators. This feels completely unnatural.
> >> How is there then a distinction between:
> >> 
> >> $or: [...]
> >> 
> >> and
> >> 
> >> "$or": [...]
> >> 
> >> the latter should be a verbatim key called "$or", not a MongoDB
> >> operator. How do we avoid having issues with inputs that contain a
> >> leading "$", or are the leading "$" signs only special in the JSON
> >> object key, rather than the value? This needs to be understood and
> >> documented. As well as clarifying any potential confusion around
> >> projections...
> > ...
> > I am still uneasy about this. What if one really wanted a key that
> > starts with "$"? Ideally the API would have supported operators without
> > overloading already quoted strings.
> 
> Using 'bson_new_from_json' seems to be the easiest way to give admins
> flexibility on what queries/projections they want to have. I actually
> initially wanted to use aggregations, but decided against that to keep
> simplicity.
> 
> Mongo 5.0 and above support keys that start with dollar signs according to 
> this:
> https://www.mongodb.com/docs/manual/core/dot-dollar-considerations
> 

I am somewhat reassured by the fact that that document consistently only
talks about dollar-prefixed *keys*, and makes no mention of special
concerns for dollar-prefixed values.  So I guess, the user will have to
know that despite the formal MongoDB syntax not needing quotes for $or,
the Postfix dictionary driver will require quotes, and the operator will
still work.

Provided "%s", "%u", and the like always appear on the *value* side of a
MongoDB query, there are no related issues.  Anyone using external input
to set a *key* in the JSON query would be asking for trouble...

We probably don't need to go as far as parsing the JSON query to ensure
that '%x' substitutions happen only in values and not in keys...

--
Viktor.
___
Postfix-devel mailing list -- postfix-devel@postfix.org
To unsubscribe send an email to postfix-devel-le...@postfix.org


[pfx-dev] Re: dict_mongodb (projections)

2023-12-06 Thread Viktor Dukhovni via Postfix-devel
On Wed, Dec 06, 2023 at 07:31:41PM -0500, Viktor Dukhovni via Postfix-devel 
wrote:

> However, I am concerned about the use of `bson_new_from_json()` and its
> need to quote the MongoDB operators.  This feels completely unnatural.
> How is there then a distinction between:
> 
> $or: [...]
> 
> and
> 
> "$or": [...]
> 
> the latter should be a verbatim key called "$or", not a MongoDB
> operator.  How do we avoid having issues with inputs that contain a
> leading "$", or are the leading "$" signs only special in the JSON
> object key, rather than the value?  This needs to be understood and
> documented.  As well as clarifying any potential confusion around
> projections...

It does, however, look overloading:

{ "$operator": ... }

to be the same as:

{ $operator:  ... }

is expected practice with MongoDB:


https://github.com/mongodb/mongo-c-driver/blob/54f737ea488caadac0cf9275c4be1fbb37cf5609/src/libmongoc/tests/test-mongoc-matcher.c#L222-L267

So the best we can hope for is that this overloading is restricted to
keys, and never applies to values in queries, so that in:

{ "$or": [ "foo": "$bar" ] }

only "$or" is special, while "$bar" is a literal.  Users will then have
to know to let untrusted content leak into query keys, but that should
be obvious regardless of metacharacter issues.

I am still uneasy about this.  What if one really wanted a key that
starts with "$"?  Ideally the API would have supported operators without
overloading already quoted strings.

-- 
Viktor.
___
Postfix-devel mailing list -- postfix-devel@postfix.org
To unsubscribe send an email to postfix-devel-le...@postfix.org


[pfx-dev] Re: dict_mongodb (projections)

2023-12-06 Thread Viktor Dukhovni via Postfix-devel
On Wed, Dec 06, 2023 at 07:06:30PM -0500, Wietse Venema via Postfix-devel wrote:

> I have been adding text to the mongodb_table that any text pasted
> in the place of a %letter directive in result_format will be subject
> to escaping, that is, Postfix inserts a backslash character before
> a double quote or backslash character.
> 
> This ensures that the result will have the same structure as
> result_format: each string in the result_format is still exactly
> one string in the result, and each special character {}[], etc. is
> still exactly one in the result. An attacker cannot 'control' how
> the result will be processed.
> 
> What about projections? Given
> 
> projection = { "_id":0, "mail_path": {"$concat": ["$domain", "/", 
> "$local_part"]} }
> 
> what if $domains contains 
> 
> foo"]}, nasty stuff...
> 

Here "$domain" is a *field name* from the JSON schema.  The `$concat`
operator will use the associated response element as part of
constructing a the value of the "mail_path" element of the response.

I don't think there's a problem here as such.

However, I am concerned about the use of `bson_new_from_json()` and its
need to quote the MongoDB operators.  This feels completely unnatural.
How is there then a distinction between:

$or: [...]

and

"$or": [...]

the latter should be a verbatim key called "$or", not a MongoDB
operator.  How do we avoid having issues with inputs that contain a
leading "$", or are the leading "$" signs only special in the JSON
object key, rather than the value?  This needs to be understood and
documented.  As well as clarifying any potential confusion around
projections...

-- 
Viktor.
___
Postfix-devel mailing list -- postfix-devel@postfix.org
To unsubscribe send an email to postfix-devel-le...@postfix.org


[pfx-dev] Re: dict_mongodb

2023-12-06 Thread Viktor Dukhovni via Postfix-devel
On Wed, Dec 06, 2023 at 08:10:22PM +, Hamid Maadani via Postfix-devel wrote:

> now, in my case, I'm using a docker container, and am using parameters
> in main.cf , a sample below:
> docker_va_uri = $docker_dburi
> docker_va_dbname = $docker_dbname
> docker_va_collection = mailbox
> docker_va_filter = {"$$or": [{"username":"%s"}, {"alias.address": "%s"}], 
> "active": 1}
> docker_va_result_attribute = username

You're mixing up the layers.  In the legacy flat "main.cf" SQL-like
table syntax, with "tablename_parameter_name" settings, yes "$" needs
to be "$$" to survive *main.cf* parameter expansion, but that is NOT
part of the underlying *table syntax*, which is what users would write
in:

main.cf:
mongodb = proxy:mongodb:${config_directory}/
virtual_alias_maps = ${mongodb}mongo-valias.cf

mongo-valias.cf:
...
filter = { $or: [ {"mail": "%s"}, { "alias": "%s" } ] }
...

And I am aslo rather puzzled by the double quotes you're putting around
`$or`.  These certainly don't appear in the MongoDB documentation.  I
would expect `"$or"` to be treated as a verbatim JSON key and not as a
MongoDB operator.  Otherwise, we potentially have deeper quoting issues
than just double-quote and backslash characters...

--
Viktor.
___
Postfix-devel mailing list -- postfix-devel@postfix.org
To unsubscribe send an email to postfix-devel-le...@postfix.org


[pfx-dev] Re: dict_mongodb

2023-12-06 Thread Viktor Dukhovni via Postfix-devel
On Wed, Dec 06, 2023 at 02:25:39PM -0500, Wietse Venema via Postfix-devel wrote:

> > This is a good point. Honestly, I didn't think about escaping characters
> > because the queries are meant to be in JSON form and taken literally,

For a lookup key to be taken "literally" its metacharacters MUST be
escaped, so that it does not introduce unintended syntax!  The data
interpolated via '%s' and '%u' comes from untrusted sources and MUST NOT
be allowed to introduce an (no)SQL-injection attack:

https://xkcd.com/327/

The documentation should clearly state that all %s/%u/%d/%[1-9]
expansions MUST be enclosed in double quotes to ensure valid JSON
string syntax:

- { "anyaddr": "%s" }
- { "domainaddr": "%u@%d" }
- { "2ld": "%2.%1" }
- ...

There is no mechanism for non-string or structured compound inputs to
the Postfix table lookup layer, so the lookup key is always an
unstructured string, containing untrusted data, and will be escaped for
inclusion in a quoted string, but the enclosing quotes MUST be provided
by the Postfix administrator configuring the lookup table.

[ By the way, db_common_expand() assumes that domain names do not
  contain escaped "." characters in labels, and just performs a
  naïve split on "." rather than parsing a general presentation
  form domain, which might be "foo\.bar.example.com", with
  "foo.bar" as its logical first label.  I expect that's not
  a concern.  Since non-RFC1123 names are broadly rejected
  by Postfix at various layers. ]

> > > (minor) the database config file parser does not expand $name,
> > > ${name} etc. so '$$' is taken literally, not as '$'. I can remove
> > > that text from the mongodb_table file
> > 
> > I think in the mongodb_table file, the expansions like $$ are included for
> > query_filter and projection. "query_filter" is expanded in 
> > dict_mongodb_lookup 
> > (line 411), but projection is not. would be best to expand projection as 
> > well
> > (maybe around line 377?)
> 
> What code is supposed to pay attention to '$' characters? The Postfix client?
> The MongoC library?

I don't see any code that expands "$$" to just "$".  The referenced
db_common_expand() function called near line 411:


https://github.com/wietse-postfix/postfix-dukhovni/blob/c753d0a358fc6e02ca3bf8b25a2598aedea4dfb8/postfix/src/global/db_common.c#L408-L510

does nothing special with '$' characters.  If MongoDB expects "$or" as
an operator, then this is verbatim what needs to be in the query.

Has this code been tested?  I don't understand how the "$$or" ever
worked:

https://www.mongodb.com/docs/manual/reference/operator/query/or/

-- 
Viktor.
___
Postfix-devel mailing list -- postfix-devel@postfix.org
To unsubscribe send an email to postfix-devel-le...@postfix.org


Re: Submission service lookup support

2023-02-14 Thread Viktor Dukhovni
On Tue, Feb 14, 2023 at 01:25:39PM -0500, Wietse Venema wrote:
> Viktor Dukhovni:
> > On Tue, Feb 14, 2023 at 01:01:05PM -0500, Wietse Venema wrote:
> > 
> > > > Fiction aside, the use-cases look reasonable to me.  I haven't thought
> > > > through of what downgrade (from e.g. DANE) are introduced by the various
> > > > (optional) fallback controls.  If they do introduce potential
> > > > downgrades, a brief note to that effect may be warranted in the docs.
> > > 
> > > There is no implied downgrade. SRV is really like MX, with weights
> > > and ports added. As long as the port info is propagated properly,
> > > TLSA will just work, and connection caching will maintain separation
> > > of traffic streams that should be distinct.
> > 
> > What I had in mind was (optionally?) ignoring SRV lookup failure, rather
> > than deferring delivery.  If there are TLSA records for the SRV targets,
> > but none for the fallback delivery method, then we possibly get a
> > downgrade by ignoring lookup failure...
> 
> But that problem already exists when a domain has some MX targets with
> TLSA records and some MX targets without TLSA?

There's a difference in my mind between operators not implementing
security across the entire service (as above), and clients switching to
an alternate service in response transient impedence in the expected
one.

But there the distincition is indeed subtle.  As I mentioned, I haven't
analysed what issues if any arise, and what documentation may be needed.

-- 
Viktor.


Re: Submission service lookup support

2023-02-14 Thread Viktor Dukhovni
On Tue, Feb 14, 2023 at 01:01:05PM -0500, Wietse Venema wrote:

> > Fiction aside, the use-cases look reasonable to me.  I haven't thought
> > through of what downgrade (from e.g. DANE) are introduced by the various
> > (optional) fallback controls.  If they do introduce potential
> > downgrades, a brief note to that effect may be warranted in the docs.
> 
> There is no implied downgrade. SRV is really like MX, with weights
> and ports added. As long as the port info is propagated properly,
> TLSA will just work, and connection caching will maintain separation
> of traffic streams that should be distinct.

What I had in mind was (optionally?) ignoring SRV lookup failure, rather
than deferring delivery.  If there are TLSA records for the SRV targets,
but none for the fallback delivery method, then we possibly get a
downgrade by ignoring lookup failure...

-- 
Viktor.


Re: Submission service lookup support

2023-02-14 Thread Viktor Dukhovni
On Tue, Feb 14, 2023 at 09:43:33AM -0500, Wietse Venema wrote:

> While we're on the topic of DANE, is there any reason why TLSA info
> is never looked up for destinations specified as [domain-name]?

That's not what I see.

$ postmap -q dnssec-stats.ant.isi.edu cdb:transport
smtp:[dnssec-stats.ant.isi.edu]

$ sendmail -f $sender -bv ...@dnssec-stats.ant.isi.edu

which then logs:

Feb 14 09:59:54 amnesiac postfix/smtp[93858]:
Verified TLS connection established
to dnssec-stats.ant.isi.edu[128.9.29.254]:25:
TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits)
server-digest SHA256
Feb 14 09:59:55 amnesiac postfix/smtp[93858]: 787821193A5:
to=<...@dnssec-stats.ant.isi.edu>,
relay=dnssec-stats.ant.isi.edu[128.9.29.254]:25, delay=0.67,
delays=0.01/0.03/0.53/0.11, dsn=2.1.5,
status=deliverable (250 2.1.5 Ok)

Ditto with "posttls-finger":

$ posttls-finger -c -Lsummary "[dnssec-stats.ant.isi.edu]"
posttls-finger: Verified TLS connection established to
dnssec-stats.ant.isi.edu[2001:1878:401::8009:1dfe]:25:
TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519
server-signature RSA-PSS (2048 bit raw public key)
server-digest SHA256

-- 
Viktor.


Re: Submission service lookup support

2023-02-13 Thread Viktor Dukhovni
On Mon, Feb 13, 2023 at 07:33:35PM -0500, Wietse Venema wrote:

> There's a first implementation in postfix-3.8-20230213-nonprod.
> Docs: https://www.postfix.org/postconf.5.html#use_srv_lookup
> Code: http://ftp.porcupine.org/mirrors/postfix-release/index.html#non-prod
> 
> To see all SRV related changes, diff the code against postfix-3.8-20230213.
> Code: http://ftp.porcupine.org/mirrors/postfix-release/index.html#experimental

How does this interact with DANE?  If the SRV RRset is DNSSEC-signed, do
we get TLSA lookups for _._tcp. (possibly after secure
end-to-end CNAME expansion), just as with MX lookups?

-- 
Viktor.


Re: Submission service lookup support

2022-08-08 Thread Viktor Dukhovni
On Mon, Aug 08, 2022 at 05:06:22PM -0400, Viktor Dukhovni wrote:

> > We're discussing support for an MUA-specific feature, not high-volime
> > MTA-to-MTA support. Connection reuse is less important, as long as
> > Postfix does not mix traffic with different authentication properties,
> > and that is what SMTP_HOST_KEY is for. So if sharing is a consern,
> > just add a "comes from SRV lookup" flag to the connection cache
> > lookup key.
> > 
> > > Are keys along the lines of "domain:submission+srv" too clumsy?
> 
> I meant TLS policy lookup keys (smtp_tls_policy_maps).  The session and
> connection caches are already fine, since transport name is part of the
> cache key.

Also, for the caches, in addition to not getting false positives from
imprecise keys, we presumably actually want to get cache hits on the
logical destination for connection reuse, which is less likely to happen
if it splits into multiple separate nexthop values.

And perhaps reuse may not be appropriate when the logical nexthop
destinations have different TLS policies, or different SASL settings,
... and yet share underlying submission servers.

So I don't think that a naïve replacement of the nexthop with the
expanded list is semantically sound, if the delivery loop is unaware of
the expansion.  These do likely need to behave just like a set of MX
hosts for a single logical destination, only determined via SRV lookup,
with a fancier sorting algorithm and potentially variable per-server
port (variation across the SRV RRset should be rare in practice).

--
Viktor.


Re: Submission service lookup support

2022-08-08 Thread Viktor Dukhovni
On Mon, Aug 08, 2022 at 04:41:57PM -0400, Wietse Venema wrote:

> > Yes.  The main complication is that connection caching, TLS session
> > caching and TLS policy are perhaps not quite right if we're not aware
> > that the list of "[host]:port" pairs is actually a single logical
> > destination, so the code would need to be integrated into smtp(8), and
> > look mostly like MX resolution that returns "host:port" values for a
> > single logical nexthop.
> 
> We're discussing support for an MUA-specific feature, not high-volime
> MTA-to-MTA support. Connection reuse is less important, as long as
> Postfix does not mix traffic with different authentication properties,
> and that is what SMTP_HOST_KEY is for. So if sharing is a consern,
> just add a "comes from SRV lookup" flag to the connection cache
> lookup key.
> 
> > Are keys along the lines of "domain:submission+srv" too clumsy?

I mean TLS policy lookup keys (smtp_tls_policy_maps).  The session and
connection caches are already fine, since transport name is part of the
cache key.

> SMTP_HOST_KEY uses newlines if I am not mistaken. And it is 
> completely hidden from the user interface.

Yes, as noted.

-- 
Viktor.


Re: Submission service lookup support

2022-08-08 Thread Viktor Dukhovni
On Mon, Aug 08, 2022 at 04:07:39PM -0400, Wietse Venema wrote:
> Viktor Dukhovni:
> > On Mon, Aug 08, 2022 at 03:03:07PM -0400, Wietse Venema wrote:
> > 
> > > All we need is a small bit of code that transforms SRV lookup results
> > > into a list of [host]:port forms that the Postfix SMTP client already
> > > understands.
> > 
> > We have code to do MX lookups, it can be cloned to RFC6186 SRV lookups
> > instead and then correctly implement the preference and weight logic!
> > 
> > The transformation needs to sort by preference and randomise the weights
> > with a given preference per the RFC.
> 
> The result of MX lookup and sorting by preference is a linear list.

Yes, precisely.  Just noting that with SRV records the random weight
sorting is slightly more tricky, but ultimately still an ordered list.

> The result of SRV lookup and sorting by weight is also a linear
> list. A this point I prefer a list in [host]:port form, simply
> because it maximzes the reuse of all existing code.

Yes.  The main complication is that connection caching, TLS session
caching and TLS policy are perhaps not quite right if we're not aware
that the list of "[host]:port" pairs is actually a single logical
destination, so the code would need to be integrated into smtp(8), and
look mostly like MX resolution that returns "host:port" values for a
single logical nexthop.

Since SRV-based server resolution would be a transport-wide behaviour,
and TLS session caching and connection reuse are already
transport-specific we'd just need to make sure that TLS policy
pertaining to port 25 delivery to a given domain is not confused with
submission to the same domain.  One way to handle that is of course
to configure custom "smtp_tls_policy_maps" for the transport, but
another may be to tweak the lookup key.  Thoughts?

Are keys along the lines of "domain:submission+srv" too clumsy?

-- 
Viktor.


Re: Submission service lookup support

2022-08-08 Thread Viktor Dukhovni
On Mon, Aug 08, 2022 at 03:03:07PM -0400, Wietse Venema wrote:

> All we need is a small bit of code that transforms SRV lookup results
> into a list of [host]:port forms that the Postfix SMTP client already
> understands.

We have code to do MX lookups, it can be cloned to RFC6186 SRV lookups
instead and then correctly implement the preference and weight logic!

The transformation needs to sort by preference and randomise the weights
with a given preference per the RFC.

-- 
Viktor.


Re: Quarantine message using milter

2022-06-29 Thread Viktor Dukhovni
On Wed, Jun 29, 2022 at 06:50:36PM +, ran...@skurelabs.com wrote:

> Is there anyway, we can send commands through milter to get list of
> all quarantined emails(in hold queue) and release emails?  We are fine
> to support postfix code changes to enable these two use cases. I would
> appreciate your help on this.

The milter should not release messages it did not quarantine, as for
messages it did quarantine, it can choose to record their queue-ids,
obviating the need to "list" quarantined messages.

Note that a message released from quarantine will be delivered to *all*
its recipients.  If you need to be able to release a message to a subset
of its recipients, you need aditional machinery to extract the message
content for delivery to just some of the pending recipients.

Please take this issue to postfix-users.  This is not a "devel" topic.

-- 
Viktor.


Re: dict_mongodb

2022-06-27 Thread Viktor Dukhovni
On Tue, Jun 28, 2022 at 01:32:52AM +, Hamid Maadani wrote:
> > The _README is a more verbose tutorial covering building the driver
> > and various use-cases and special considerations, leaving syntax
> > reference docs to the _table(5) document.
> 
> Should I create the html version in proto directory? or the text
> version in the README_FILES directory? or both?

The text version is generated from the HTML.  See the Makefile
in proto.  You only need to create the HTML version, using the
style of existing README files (essentially similar markup).

-- 
Viktor.


Re: dict_mongodb

2022-06-27 Thread Viktor Dukhovni
On Tue, Jun 28, 2022 at 01:03:43AM +, Hamid Maadani wrote:

> > - Are all the table features documented in mongodb_table(5)?
> 
> They are.

The _table(5) doc concisely covers all table syntax features.

> > - Is there a MONGODB_README that covers building the driver and
> > tutorial prose with usage examples, best-practices, and so on?
> 
> There was one, but Wietse asked for a mongodb_table so he can generate the 
> MONGODB_README off of that in response to my very first email. I removed the 
> README I created, but I can put that back in if need be. This is what I used 
> to build the module:
> make makefiles pie=yes shared=yes dynamicmaps=yes 'CCARGS=-DHAS_MONGODB 
> -I/usr/include/libmongoc-1.0 -I/usr/include/libbson-1.0' 
> 'AUXLIBS_MONGODB=-lmongoc-1.0 -lbson-1.0'
> obviously requires mongo-c-driver and mongo-c-driver-dev packages.

The _README is a more verbose tutorial covering building the driver
and various use-cases and special considerations, leaving syntax
reference docs to the _table(5) document.

> > - Have you tested the key features?
> 
> I have built and tested the module on Alpine-3.16 inside a container,
> using a MongoDB Atlas cluster for the backend database. Tried to cover
> as many scenarios as I could, but as always, single-person testing has
> limited reliability. Would be best if we had others test this as well.
> 

Thanks.  I think the major hurdles have been mostly cleared.  I would
encourage you to draft an initial _README that is similar in spirit
to those for PostgreSQL and/or LDAP.

After that, Wietse and I will have to find some time for code review.
This may take a bit of time, but should ideally happen in time for
3.8.0, and so naturally would need to be complete a few snapshots
earlier.

-- 
Viktor.


Re: dict_mongodb

2022-06-27 Thread Viktor Dukhovni
On Mon, Jun 27, 2022 at 11:53:54PM +, Hamid Maadani wrote:

> Fyi, I have added a second commit to the mongodb branch of my fork on
> github, which will enable mongo projections:
> https://github.com/21stcaveman/postfix/commits/mongodb
> 
> I have kept them separate in case it is chosen not to introduce
> projections for now.  The reason I go back to projections, is that I
> see operations like concatenation all over when it comes to SQL setups
> with postfix. Even looking at postfix-admin docs, I can see the same:
> https://github.com/postfixadmin/postfixadmin/blob/master/DOCUMENTS/POSTFIX_CONF.txt
> 
> I understand that it might not be good database design, but I'm not
> sure if we should limit users' choice when it comes to layout of the
> database. Also, without projections, the Mongo implementation would be
> incomplete compared to the SQL implementation.

- Are all the table features documented in mongodb_table(5)?

- Is there a MONGODB_README that covers building the driver and tutorial
  prose with usage examples, best-practices, and so on?

- Have you tested the key features?

If so, you may be ready, modulo code style/quality improvements, ...

-- 
Viktor.


Re: dict_mongodb

2022-06-24 Thread Viktor Dukhovni
On Thu, Jun 23, 2022 at 06:13:05PM +, Hamid Maadani wrote:

> The code is updated. Now:
> - It accounts for the 'domain' parameter
> - It requires a JSON formatted 'filter' parameter (no more search_key)

Good.

> - It uses comma-separated 'result_attribute' to return fields off of query 
> results.
> The result will be a comma-separated string if multiple results present.
> - It uses 'result_format' attribute to format the result

I think this will be the most natural stratghtforward option for most users.

> - It uses 'expansion_limit' attribute to control number of results being 
> returned

Excellent, note that this should count each attribute value as one
result, whether from the same document, or multiple documents.

> - mongoc_client_pool_t has been converted to mongoc_client_t
> - Each dict has it's own mongoc_client_t, to account for different backend 
> databases.

Good.

> - advanced projections are commented out for now, and not supported.

Thanks.  This can be and may need to be considered later.

> I am still conflicted about the 'result_attribute' + 'result_format' 
> combination.
> I understand what 'result_format' is used for. However, this can be done 
> inside projections
> as well. I fail to see the advantage of the combo, since it will not "guide" 
> users to return
> fields of the same "type". What are we solving for here? simplicity? 
> backwards compatibility?

Projections are a much more complex "programming language", that should
not required in the simplest use-cases where attribute values are used
as-is, or with minor static decorations.

Given your earlier answers about mailing list modelling, I should ask
how list-valued attributes are supported?  Do just expand to a
comma-separated list of the underlying values (I'd expect yes)?

-- 
Viktor;.


Re: dict_mongodb

2022-06-21 Thread Viktor Dukhovni
On Wed, Jun 22, 2022 at 05:12:08AM +, Hamid Maadani wrote:

> Understood. Is there any prior code in postfix I can repurpose for array 
> management to keep an
> static list of mongoc_client_t objects (one per named dict)? Or should I 
> write it within the module?
> trying to avoid creation and destruction of clients per lookup call, and keep 
> a persistent connection.

You don't need this.  Postfix keepts tables open across queries, reusing
the database connection for multiple queries.  You'll have no static
objects.  All the objects you need should be referenced from the
MongoDB-specific table (dictionary) structure.

Do not confuse "lookup" with "open".  Tables stay open for the lifetime
of the process, but connections can be restarted from time to time,
after some time, or some number of queries, or when they're closed by
the server, and a query detects that the server is gone (in that case
the connection must be reopened and the query transparently retried,
once).

-- 
Viktor.


Re: dict_mongodb

2022-06-21 Thread Viktor Dukhovni
On Wed, Jun 22, 2022 at 04:13:40AM +, Hamid Maadani wrote:

> > This sort of "concat" operation is a bad idea, because it is prone to 
> > collisions...
>  
> Those were just examples to discuss a point. You can find similar
> types of concatenations in multiple guides written for setting up
> postfix with a mysql backend. For example refer to
> 'virtual_alias_domains.cf' mentioned in arch linux's wiki page:
>
> https://wiki.archlinux.org/title/Virtual_user_mail_system_with_Postfix,_Dovecot_and_Roundcube

There are lots of Wikis giving dubious advice.  Yes, in some corner
cases one might actually want to compute some result elements as
concatenations of multiple input fields, and perhaps this can be
supported, but it should not be encouraged, and the simple cases where
this is not used should be easy and natural to express.

> I was just trying to understand if these type operations (concat,
> etc.) need to be supported in the projection. Am I correct in
> understanding they are not?

They're not essential, but can be added as expert features.  Let's get
the basics right first, and talk about the expert features second.

> If the result_attribute + result_format design is the best practice,
> I'm all for that.  need to go look at the result_format and understand
> how to use it with mongo..

It is the "basics right" approach, which avoids advanced MongoDB
syntax.

> >> which would return:
> >> maadani,ha...@dexo.tech/,dukhovni,vik...@postfix.org/
> > 
> > which makes no sense.
>  
> This is honestly confusing to me. This was meant to show we are
> printing multiple multi-valued results as one comma separated string.

These *particular* results make no sense because you're mixing last
names with directory paths.  The list elements are from different
semantic domains.

> When you say this makes no sense, are you referring to this result not
> being useful to postfix because of multiple mail-paths in it? or the
> comma separated string part!?

Neither, it is the disparate semantics of the elements.  Had the
elements all come from the same semantic domain, and not been compounded
from multiple input columns, they would typically all have the same
post-processing requirements, that could likely be handled with just
"result_format".

> > You do have to decide how mailing lists are modeled in MongoDB. Are
> > they one row per member? Is it a list of "_id" values? Or a list of
> > email addresses? If the former, how does list expansion work? Can
> > MongoDB do joins as well as projections? ...
> 
> I imagine each list as a JSON object with an array of addresses inside of it. 
> Something like:
> { "createdAt": ISODate(""), "active": 1, "addresses": [ 
> "ha...@dexo.tech",
> "vik...@postfix.org" ] }
>  
> Would that work?

Only if you provide code to handle list-valued result columns, and if
such denormalised schemas are best-practice for MongoDB.  A more typical
database practice is to have a "member" table, which makes it easy to
insert users into lists without modifying the list itself, to delete
a user from all the lists a user is a member of, ...

Member tables work best the database supports some form of "join"
operation, though of course they could be as simple as:

{ "list": "somel...@example.net", "member": "la...@example.net' }
{ "list": "somel...@example.net", "member": "cu...@example.net' }
{ "list": "somel...@example.net", "member": "m...@example.net' }

with both the list name and the member primary address stored by value,
rather than by reference.

Is there some prior art in this space?  Has anyone used MongoDB for
managing email users and lists with some other MTA?

> MongoDB supports joins, but through "aggregation pipelines":
> https://www.mongodb.com/docs/manual/aggregation
> 
> here, we are using 'mongoc_collection_find_with_opts' which runs a 'find' 
> operation. If support for
> joins are necessary, we should switch to 'mongoc_database_aggregate' and 
> require 'filter' to be in the 
> pipeline format:
> http://mongoc.org/libmongoc/current/mongoc_database_aggregate.html

Well, this is an important design choice.  What sort of schemas are
best-practice in this space?  Are joins need to enable some data
"normalisation"? ...

> One more question, what's the policy regarding multiple databases? the
> way that the module is now, it supports multiple collections (tables)
> in only one database. Should I put any effort in supporting multiple?
> For example, if mailboxes are in cluster 1 and mail lists in cluster 2
> (separate URIs basically)?

There should be no assumption that all tables use the same database.
Each table designates its source database.  The thing that need not
be supported (and is likely impossible to express or difficult to
implement) is joins or other operations that span multiple databases.

-- 
Viktor.


Re: dict_mongodb

2022-06-21 Thread Viktor Dukhovni
On Tue, Jun 21, 2022 at 07:26:53PM +, Hamid Maadani wrote:

> Want to discuss the 'result_attribute' before I go ahead with the 
> implementation though.
> It is kind of the same story as with 'filter' and 'search_key' attributes.

Not exactly, there are enough differences to warrant specific treatment.

> If we are to return multiple fields, each of which might need a custom
> projection.

Whatever the returned fields are, it only makes sense to combine values
of the same semantic "type".  Postfix dictionary lookups return either a
single value or a list of comma-separated values with the list eleemnts
interpreted in a uniform way.

Therefore, in the typical case where the result attribute are returned
"as-is", the "projection" is trivial (identity function) and the values
(if more than one) are returned comma-separated.

> So, We would need to have multiple result_attributes, and each one is
> either set to '1' (matching exact field in the document),
> or an optional JSON document for projection.

No.  The "result_attribute" list is a list of attributes whose values
are either used as-is or perhaps (in harmony with e.g. the LDAP table)
are subject to a "result_format" template, with each value expanded
into the specified template.

If instead the desired result elements are built from multiple MongoDB
columns, via some MongoDB "projection" template, then "result_attribute"
should not be used, and the user can specify the relevant projection
instead.

> Consider this data set:
> { "first": "hamid", "last": "maadani", "domain": "dexo.tech", active: 1, 
> "other": "some other field 1"}
> { "first": "viktor", "last": "dukhovni", "domain": "postfix.org", active: 1, 
> "other": "some other field 2"}
> { "first": "wietse", "last": "venema", "domain": "postfix.org", active: 0, 
> "other": "some other field 3"}

In practive you'd always want a mailbox address, and often also the
associated primary email address (for canonical_maps):

   filter = { "address": "ha...@dexo.tech"
, "mail": "hamid.maad...@dexo.tech"
, "maildrop": "ha...@imap1.dexo.tech"
, "maildir": "dexo.tex/hamid/"
,  active: 1 }

So that mail addressed to "address" is canonicalised to "mail" in
headers, but rewritten to "maildrop" for delivery to the right server,
where (on that server) "maildir" is the relative path to the deliver to.

Each user would have one or more input "address" values that
canonicalise to the same primary address, and route to the same
maildrop and ultimately maildir.

> Let's say I need mail path for all active users, as well as their last name.

This makes no sense and is not useful.  The result elements are not of
the same semantic type.  The "last name" is not a maildir path.  You'd
never do this.

Furthermore, Postfix never needs/wants the whole database, it asks for
data matching a specific lookup key (recipient or sender address, a
domain or IP address for access(5) lookups ...) and wants just the
matching row(s), not a database dump.

> The format for mail path is 'first@domain/'. The filter would be
> simple: { "active": 1 }

This is not a useful filter.  Instead you'd use, for example:

{ "active": 1, "address": "%s" }

> We need to build this projection for the desired result:
> { "_id": 0, "last": 1, "mail_path": {"$$concat": ["$$first", "@", "$$domain", 
> "/"]} }

A sensible "projection" would then be "maildrop" for virtual_alias_maps,
and "maildir" for "virtual_mailbox_maps", ...

> { "_id": 0, "last": 1, "mail_path": {"$$concat": ["$$first", "@", "$$domain", 
> "/"]} }

This sort of "concat" operation is a bad idea, because it is prone to
collisions, and works poorly when the user's name changes, but the
mailbox stays the same, ...  It is just bad design.  Store the mailbox
directly as an explicit fixed string.

> which would return:
> maadani,ha...@dexo.tech/,dukhovni,vik...@postfix.org/

which makes no sense.

> 'result_attributes' would be need to be set to :
> result_attributes = last, mail_path

See above.

> and I guess the 'projection' attribute would need to be set to something like:
> projection = 1, {"$$concat": ["$$first", "@", "$$domain", "/"]}

You're working too hard.  This sort of thing just encourages bad
design choices.  Just implement "result_attribute" (typically just
one), and "result_format" (ala LDAP) to pick this apart and inject
it into some template if need be.  Then combine multiple results
(if need be) by separating with commas.

You do have to decide how mailing lists are modeled in MongoDB.  Are
they one row per member?  Is it a list of "_id" values?  Or a list of
email addresses?  If the former, how does list expansion work?  Can
MongoDB do joins as well as projections? ...

> or something similar to this.
> Would it not be more beneficial/easier to leave the projection as a JSON 
> object and up to the user,
> instead of having 'result_attribute'?

No.

-- 
Viktor.


Re: dict_mongodb

2022-06-18 Thread Viktor Dukhovni
On Sun, Jun 19, 2022 at 12:47:39AM +, Hamid Maadani wrote:

> > perhaps typically querying a single underlying "database" with
> > different queries/results for each "table".
> 
> Isn't that the case, when we configure postfix with mysql for example,
> and create different tables for virtual domains, virtual users and
> virtual aliases?

Yes, a single database with multiple tables is common.  So reusing
connections across tables makes some sense, but multi-connection
thread pools and threads, ... all for a single connection are
way overkill.

IIRC the LDAP table code does attempt to share the same LDAP connection
for tables that differ essentially only in the query and result
attributes.

At this stage, best to keep it simple and make the code correct first.
Optimisations (whether thread pools or connection sharing can come
later).

-- 
Viktor.


Re: dict_mongodb

2022-06-17 Thread Viktor Dukhovni
On Sat, Jun 18, 2022 at 02:45:26AM +, Hamid Maadani wrote:

> I usually use client pools, because of their thread safety (which is not 
> needed here) as well as
> the more aggressive cluster monitoring operations they have by default 
> compared to the single
> threaded mongoc_client_t. There is no need to manually monitor the connection 
> and make sure 
> server has closed it when using mongoc_client_pool_t. That being said, I have 
> nothing against the
> use of mongoc_client_t. Will convert to that.

The client pool model could be justified, If you have a strong
understanding of how a client pool would work in a process with
potentially multiple open Postfix tables, perhaps typically querying a
single underlying "database" with different queries/results for each
"table".

Postfix processes are single-threaded, so serialise all the queries, and
if the database connection is reusable, and the pool is small, it could
work well.

Basically you want a "pool" of one connection per logical database
reguardless of the query filter or result attributes.  With ideally
some reuse of that connection until idle long enough to drop it and
start a new one.

> Question about the result_attributes, how should multi-columned
> multiple results be returned?

Comma-separate everything.  Whether from different rows, or different
columns.  The only time this is relevant in Postfix is email address
lists.

> For example, if 'result_attributes = username,password', for below
> result set (filter = {"active":1}): 
> {"username":"hamid","password":"blah","active":1}
> {"username":"wietse","password":"viktor","active":1}
> 
> Should it return:
> 
> hamid,blah,wietse,viktor

It can only return a single string.  So the above.

> or:
> 
> hamid,blah
> wietse,viktor

This makes no sense.  There are no structured result sets, just
single values or or comma-separated lists of email addreses.

Don't forget to support expansion_limit (typically 1 if used),
which would result in an error if more than one result would
need to be output (comma-separated).  Unlike LDAP, there's
no recursion (no columns that are query URIs, triggering
recursion) so no need for a recursion limit.

-- 
Viktor.


Re: dict_mongodb

2022-06-17 Thread Viktor Dukhovni
On Fri, Jun 17, 2022 at 04:53:41PM -0400, Wietse Venema wrote:
> Viktor Dukhovni:
> > Also, your parsing of the search_keys is hand-rolled, but should be
> > using mystrtok(3) to split the list on commas/whitespace, and
> > split_nameval(3) to split "key = value" pairs on "=". 
> 
> If the result may contain quoted strings, then we need a smarter
> parser than wnat mystrtok() does now.

If multiple search columns are to be supported, with a mixture of
strings and numeric values,  Perhaps the search filter should remain
a JSON object:

query_filter = {user:"%s", active:1}

without support for:

search_keys = user:"%s", active:1

there's hardly much value in splitting these, and performing separate
substitutions only add missing "{" and "}" front and back.  The real
value is multiple result attributes which are simple names:

result_attribute = foo, bar, ...

The key missing code is correct JSON quoting in expansion "%s".

-- 
Viktor.


Re: dict_mongodb

2022-06-17 Thread Viktor Dukhovni
On Fri, Jun 17, 2022 at 06:59:29AM +, Hamid Maadani wrote:

> > You need to use a static variable to record whether you've already
> > initialised the library, and do it just once. No need to worry about
> > threads or locks. Postfix is single-threaded.
> 
> This is also done.

Also, the "client pool" is pointless, Postfix processes are not
multi-threaded.  Instead use a single dedicated (long-term) connection
per open table, but be prepared to retry if the connection was
unilaterally closed on the server side.

I'd rename "result_key" to "result_attribute" and allow it to be
multi-valued (multiple potential columns to retrieve from the returned
object, either alternatives, or sometimes many from the same object).

Also, your parsing of the search_keys is hand-rolled, but should be
using mystrtok(3) to split the list on commas/whitespace, and
split_nameval(3) to split "key = value" pairs on "=".  Also vstring
allocation is not an upper bound, but a guestimated size, and you
should be appending to them via vstring_sprintf_append(3) and the
like, not poking bytes directly into the buffer...

Also the docs have:

search_keys = username:"%s", active:1

Your code is responsible for proving a JSON-compatible quoting function
that encodes the string interpolated via the "%s".  This needs to be
passed to db_common_expand().  Otherwise the query can easily be
malformed/manipulated:

https://xkcd.com/327/

-- 
Viktor.


Re: dict_mongodb

2022-06-16 Thread Viktor Dukhovni
On Thu, Jun 16, 2022 at 08:48:00PM -0400, Viktor Dukhovni wrote:

> > Just Fyi, when compiling postfix, I keep running into  missing from
> > src/posttls-finger/posttls-finger.c
> > Adding the stdio header resolves the issue, easy fix.
> 
> Feel free to post a patch.  I don't see this on FreeBSD or MacOS.  What
> platform are you building on?

I guess since usage() references stderr, including  explicitly
is warranted.

--- src/posttls-finger/posttls-finger.c
+++ src/posttls-finger/posttls-finger.c
@@ -335,6 +335,7 @@
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 

-- 
Viktor.


Re: dict_mongodb

2022-06-16 Thread Viktor Dukhovni
On Fri, Jun 17, 2022 at 12:39:30AM +, Hamid Maadani wrote:
> > You need to read the mongodb documentation with care and make sure
> > that you honour their API contract. For example:
> > 
> > mongoc_init():
> 
> When it comes to 'mongoc_init' and 'mongoc_clean', they are supposed to be 
> run once per process.
> If I understand how postfix works correctly, it invokes 'dict_mongodb_open'
> when it loads the shared-object for the dict (postfix-mongodb.so).

No, this is invoked once per mongodb table.

> I did not
> Find an init function for the dict libraries, just the open method.
> That would mean that mongoc_init would be run every time that library is
> opened (which should be once for postmap and proxymap, unsure about postfix 
> master).

You need to use a static variable to record whether you've already
initialised the library, and do it just once.  No need to worry about
threads or locks.  Postfix is single-threaded.

> Please let me know if my understanding is incorrect.
> Would it be preferred to move this to postfix master process instead of a 
> loadable
> module? Even if the user does not want to use Mongo?

No.  That can't work.

> I guess that's a difference in our way of thinking. Personally, it would be 
> easier for me
> this way, but if you see more value in failing in case both are specified, 
> that's easy to
> implement. Just let me know.

I think that's better in this case.

> Question about using variables in main.cf I'm getting 'unused
> parameter' errors for my mongo variables, where I know I am using
> them.  Is there something I need to do in the module to mark a
> variable as used while reading it from the config?

See "src/postconf/" for how this is handled for LDAP, ...

> Just Fyi, when compiling postfix, I keep running into  missing from
> src/posttls-finger/posttls-finger.c
> Adding the stdio header resolves the issue, easy fix.

Feel free to post a patch.  I don't see this on FreeBSD or MacOS.  What
platform are you building on?

-- 
Viktor.


Re: dict_mongodb

2022-06-16 Thread Viktor Dukhovni
On Thu, Jun 16, 2022 at 08:11:20PM +, Hamid Maadani wrote:

> Please let me know if any other adjustments are necessary.

You need to read the mongodb documentation with care and make sure
that you honour their API contract.  For example:

mongoc_init():
Initialize the MongoDB C Driver by calling mongoc_init() exactly
once at the beginning of your program. It is responsible for
initializing global state such as process counters, SSL, and
threading primitives.

And yet you call this function once per table in the "open" routine.
This may make the code unsafe for opening multiple tables.

Please review the correctness/safety of all the touch points between
MongoDB and the table driver.  Does anything special need to be done to
avoid issues with non-utf8 binary inputs?  Perhaps all query keys that
are not valid UTF8 should short-circuit to "not found"?

-- 
Viktor.


Re: dict_mongodb

2022-06-16 Thread Viktor Dukhovni
On Thu, Jun 16, 2022 at 08:11:20PM +, Hamid Maadani wrote:

> I have uploaded the latest code, which simplifies filters, projections
> and accommodates multiple results. In this version:
>
> - Users can search by specifying search keys, OR writing more advanced 
> filters. If search_keys
> are specified, value of 'filter' will be ignored if specified.

Perhaps best to fail if both are specified.  The conflict may be missed
by some users.

> - Users can specify a return_key for a simplified projection. It would
> default to "result" if not specified.

Good.  Don't know whether "result" is a probable common attribute,
if not, perhaps simply require that exactly one of "return_key" and
"projection" is set.  Note that "return_key" could be a list of
attributes in which case all are used, comma-separated if they occur
in the same result object, and of course also if present in multiple
objects.

> - Users can set more complex projections. If a projection is set,
> value of return_key will become irrelevant (the code handles that
> automatically).

Perhaps again fail on conflicting configuration information?

> - If there are multiple results, a comma-separated list of strings will be 
> returned.

Yes.

> > You should also support the "domain" attribute to limit the keys sent to
> the database to just email addresses with a domain part matching one of
> the elements of that list.

See db_common_parse_domain(), and its call sites in the LDAP, MySQL and
PostgreSQL drivers.

-- 
Viktor.


Re: dict_mongodb

2022-06-15 Thread Viktor Dukhovni
On Wed, Jun 15, 2022 at 10:12:57PM +, Hamid Maadani wrote:

> Let's say I have a database called 'mail' in my MongoDB cluster, and there
> is a collection (table) named 'mailbox' inside of it.
> Each object in this collection holds a mailbox, and also includes it's
> aliases (real world example from my mail server):
> 
> {
> "_id" : ObjectId(REDACTED),
> "username" : "ha...@dexo.tech",
> "password" : "REDACTED",
> "name" : "Hamid Maadani",
> "maildir" : "ha...@dexo.tech/",
> "quota" : REDACTED,
> "local_part" : "hamid",
> "domain" : "dexo.tech",
> "created" : ISODate("2016-11-07T21:07:21.000Z"),
> "modified" : ISODate("2017-05-02T22:10:00.000Z"),
> "active" : 1,
> "alias" : [ 
> {
> "address" : "ab...@dexo.tech",
> "created" : ISODate("2016-11-07T21:04:16.000Z"),
> "modified" : ISODate("2016-11-07T21:04:16.000Z"),
> "active" : 1
> }, 
> {
> "address" : "hostmas...@dexo.tech",
> "created" : ISODate("2016-11-07T21:04:16.000Z"),
> "modified" : ISODate("2016-11-07T21:04:16.000Z"),
> "active" : 1
> }
> ]
> },
>
> {
> "_id" : ObjectId(REDACTED),
> "username" : "ad...@dexo.tech",
> "password" : "REDACTED",
> "name" : "Site Admin",
> "maildir" : "ad...@dexo.tech/",
> "quota" : REDACTED,
> "local_part" : "admin",
> "domain" : "dexo.tech",
> "created" : ISODate("2017-04-12T22:24:17.000Z"),
> "modified" : ISODate("2021-06-03T02:54:35.000Z"),
> "active" : 1
> }

See LDAP_README which talks about structurally similar use-cases.

> Now, if I want to query for 'ha...@dexo.tech', or any of it's aliases, I 
> would use
> this filter:
> filter = {"$$or": [{"username":"%s"}, {"alias.address": "%s"}], "active": 1}
> 
> of course, this will return an entire JSON document.

JSON documents will be fairly useless in Postfix, so there should always
be a projection, and its syntax should be more user-friendly by default,
so similar to "result_attribute" in LDAP, i.e. just a key to extract
from the JSON document, which should be either a string-valued scalar,
or a list of string values (which you would internally combine with
commas).

> So I would use Mongo projections to
> limit it to just one key-value pair, in which case, the driver will only 
> return the value:
> options = {"projection": {"_id": 0, "username": 1}}

Using explicit "projections" should be an advanced feature, far better
to ask users to write:

result_attribute = username

(As a side question, Why is "_id" part of the "projection"???)


> If I change the query to:
> filter = {"active": 1}
> 
> and keep the same projection, it will return:
> {"username":"ha...@dexo.tech"}
> {"username":"ad...@dexo.tech"} 
> 
> from what you just described, the result should be returned as:
> ha...@dexo.tech,ad...@dexo.tech

Yes.

> That's the easy part. Question is, should I allow users to use the 
> filter/options combination
> to search however they need? The reason I ask is, if someone uses this 
> projection:
> {"projection": {"_id": 0, "username": 1, "name" : 1}}
> 
> The results would be:
> {"username":"ha...@dexo.tech", "name":"hamid"}
> {"username":"ad...@dexo.tech", "name":"admin"}
> 
> and I am unsure how to handle that. Limit it by requiring a key?

Returning compound JSON objects should be an error.  Only string and
list of string values should be supported.  The result is obtained
by flattening all lists to a comma-separated string, and then combining
these (again comma-separated) across the returned JSON "documents".

> Or just return the string representation of the JSON documents for the
> user to parse by 'jq' or similar utilities?

No, because Postfix dictionaries are not for use in MongoDB query CLIs,
they are for the MTA to resolve email lists, transports, ... where
only scalar results and sometimes comma-separated lists are supported.


> Would there be a use case for that (postmap -q ... | jq)?

No.

> I assume I should limit it by key, but want to run it by you guys first..

Read LDAP_README, virtual(5), aliases(5), transport(5), ...

Note also that with e.g. LDAP there's also a way to specify an expansion
limit (e.g. 1), so that queries returning more than the expected number
of rows fail instead of returning garbage.

So, also see ldap_table(5) for any additional table attributes (like
the previously mentioned "domain") that may apply.

-- 
Viktor.


Re: dict_mongodb

2022-06-15 Thread Viktor Dukhovni
On Wed, Jun 15, 2022 at 09:22:37PM +, Hamid Maadani wrote:

> This is good, was unaware of the multi-row result standard.
> How does this work with other DBs? for example, if you have two result sets:
> { "name": "hamid", "value": "test" }
> { "name": "viktor", "value": "test2" }

Well, Postfix dictionaries cannot meaningfully consume structured data,
you can return a single value, or a comma-separated list.  If it is a
list of email addresses, they should be stored in "external form"
(RFC-5322 quoted) allowing robust parsing as a comma-separated list of
such quoted forms).

Thus "name" might be the key, and "value" the desired value column,
so that a query for "hamid" would return "test", and a query for
"viktor" would return "test2".

But if the database also held:

{ "name": "devs", "value": "wietse" }
{ "name": "devs", "value": "viktor" }
{ "name": "devs", "value": "hamid" }

then a query for "devs" would return:

wietse,viktor,hamid



> should it return as below?
> hamid,test,viktor,test2

So definitely not this, it makes no sense.

> Regarding the 'db_common_expand' line, I used the same function used for the 
> SQL dict,
> to support expansions like %s , %u and such for mongodb search filter, if 
> that makes sense.

Yes, "%s", "%u" and "%d" are reasonably expected.

You should also support the "domain" attribute to limit the keys sent to
the database to just email addresses with a domain part matching one of
the elements of that list.

-- 
Viktor.


Re: dict_mongodb

2022-06-15 Thread Viktor Dukhovni
On Wed, Jun 15, 2022 at 04:22:11PM +, Hamid Maadani wrote:

> I have developed a MongoDB module for postfix. Given that
> mongo-c-driver has gone mainstream on most linux distributions, I
> personally think this would be a good addition to postfix, allowing
> users to use MongoDB as a backend database. I am currently using it on
> the same server sending this email from.  You can review the code here
> : https://github.com/21stcaveman/postfix/tree/mongodb
> (https://github.com/21stcaveman/postfix/tree/mongodb)

How does it handle multi-row result sets?  The expectation with other
Postfix databases is that the results will be combined with
comma-separators.  I don't see where that logic is in the code...

Can you explain:


https://github.com/vdukhovni/postfix/compare/master...21stcaveman:mongodb#diff-4b022a4dceb7c17dc27bd501f3e5563bf43bc485b631c060075444e986ca673dR166

And returning no rows is definitely normal and expected, and you should
then (as you do) return a NULL result.

On Wed, Jun 15, 2022 at 03:17:00PM -0400, Wietse Venema wrote:

> Hopefully your code supports the "common" dictionary features (comon
> for LDAP, *SQL*). I recall working on a contributed driver that did
> not do that; it would be difficult for me to add that code and test
> it, because I do not have every possible databse.

It does appear to the db_common parser infrastructure:


https://github.com/vdukhovni/postfix/compare/master...21stcaveman:mongodb#diff-4b022a4dceb7c17dc27bd501f3e5563bf43bc485b631c060075444e986ca673dR112-R124

https://github.com/vdukhovni/postfix/compare/master...21stcaveman:mongodb#diff-4b022a4dceb7c17dc27bd501f3e5563bf43bc485b631c060075444e986ca673dR166


-- 
Viktor.


Re: Quarantine message using milter

2022-06-15 Thread Viktor Dukhovni
On Wed, Jun 15, 2022 at 10:58:35AM -0400, Wietse Venema wrote:

> Viktor Dukhovni:
> > Release all quarantined mail from "harml...@example.net" to
> > "artl...@example.org" (and any other recipients of the same message
> > envelope):
> > 
> > # jq -r '
> > first(select(.queue_name == "hold" and
> >  (.queue_id | test("^[0-9A-F]+$")) and
> >  .sender == "harml...@example.net" and
> >  (.recipients[].address == "artl...@example.org")))
> > | .queue_id
> > ' | postsuper -H hold
> 
> Did you mean:
> 
>  postqueue -j ... | jq ... | postsuper -H - hold
> 

Yes, carelessly left out the leading "postqueue -j" and the "-" from
"postsuper -H".  So the correct form is:

 # postqueue -j |
   jq -r '
 first(select(.queue_name == "hold" and
  (.queue_id | test("^[0-9A-F]+$")) and
  .sender == "harml...@example.net" and
  (.recipients[].address == "artl...@example.org")))
 | .queue_id
 ' |
   postsuper -H -

[ The explicit "hold" queue name is presumably redundant in "postsuper -H" ]

-- 
Viktor.


Re: Quarantine message using milter

2022-06-15 Thread Viktor Dukhovni
On Wed, Jun 15, 2022 at 08:54:34AM -0400, Wietse Venema wrote:

> That queue ID should also show up in the 'hold' queue when you use
> the "mailq" command. Example:
> 
> $ mailq | grep '!'
> 4LN9p23LK0zJrP1!   983 Tue Jun 14 23:31:34 u...@example.com

Alternatively, use the machine-readable JSON output from "postqueue -j":

$ postqueue -j | jq -r '
first(select(.queue_name == "deferred" and
 (.queue_id | test("^[0-9A-F]+$")) and
 (.recipients[].address | test("^.ostmaster"
| .queue_id
'
CF6B4106ABD7

I have a "deferred" message addressed to "postmaster" and "hostmaster"
at a poorly operated domain, so the "queue_name" was "deferred".  In
your case it should be "hold".

To avoid trusting the low-privilege Postfix user to supply arguments
for commands run as root, the queue id is filtered to be of the
expected upper-case hexadecimal form.

Because the recipient test (deliberately, to make the point) matches
either recipient, to avoid reporting the same queue file multiple times
,once for each matching recipient, the select(...) is pruned to the
first match via first(select(...)).


> Use the "postsuper -H" command to release the message from quarantine.
> 
> # postsuper -H 4LN9p23LK0zJrP1
> $ postfix flush
> $ grep 4LN9p23LK0zJrP1 /var/log/maillog
>   postfix/postsuper[]: 4LN9p23LK0zJrP1:
> released from hold

Release all quarantined mail from "harml...@example.net" to
"artl...@example.org" (and any other recipients of the same message
envelope):

# jq -r '
first(select(.queue_name == "hold" and
 (.queue_id | test("^[0-9A-F]+$")) and
 .sender == "harml...@example.net" and
 (.recipients[].address == "artl...@example.org")))
| .queue_id
' | postsuper -H hold

-- 
Viktor.


Re: Possible remote DOS triggering qmgr 'unix-domain name too long' crash?

2021-10-29 Thread Viktor Dukhovni
On Fri, Oct 29, 2021 at 09:00:20AM +0200, Benoît Panizzon wrote:

> It turned out, one file in the 'active' queue, was causing qmgr to
> crash:
> 
> postfix/qmgr[86256]: fatal: unix-domain name too long:
> private/fwZ+GX2pP7y/mKTz0/vD8xX7o/.../BqIQ4kqVv3lAEe6idjSSxkUp5oAj3U/FGKWgCN

It sure looks like you have a problem transport table or content filter
setting.  Connections from the queue manager are mostly to the
unix-domain sockets of named transports, "private/smtp",
"private/local", ...

So, somehow this particular message resolves to a
"" transport.

So the question to you is:

* What sort of transport table(s) do you have in place
* What sort of content filters do you have in place
* Where is the problem queue file.

Your "postconf -nf" and "postconf -Mf" settings are also needed,
as well as any logging related to the queue file from "smtpd",
"cleanup" and "qmgr"  (presumably it repeatedly logged adding
the message to the active queue).

-- 
Viktor.


Re: XCLIENT enhancement needed

2021-10-11 Thread Viktor Dukhovni
On Mon, Oct 11, 2021 at 07:17:12PM -0400, Wietse Venema wrote:

> > If the goal is to leave a forensic trace, then it may be simpler to add
> > an optional list of trace key/value pairs to XCLIENT, which the
> > receiving MTA can choose to add to the message Received header.
> > 
> > https://www.rfc-editor.org/rfc/rfc8314.html#section-7.4
> > 
> > While I'd have preferred a slightly different definition of
> > these elements, they're probably sufficient for the needs above.
> 
> That would certainly be possible.
> 
> How about: 
> 
> XCLIENT ... ATTR=key=value ...
> 
> where key and value are xtext encoded as per RFC 1891, meaning that
> they cannot contact '=' (or '+').

Yes, that's basically it.  Perhaps rather than "ATTR" it could be
"CLAUSE" (to match the RFC terminology) or some other bikeshed colour.

> Does it make sense to specify a list of key names that Postfix will
> accept in this manner?

That may be a reasonable setting if the the XCLIENT software is not
easily configurable and generates more tags than one wants to import.

> If the client can send arbitrary keys, how many distinct keys will
> Postfix accept before it rejects further input? XCLIENT is forbidden
> by default, so we can have a generous default limit.

Well, we're probably not going to extend the SMTP command length limit
for XCLIENT, so only a modest number per line, but IIRC XCLIENT is
cumulative, so I guess we should limit the XCLIENT command count to at
most a dozen or so.  The client can pack multiple clauses per command
(which won't cost us enough to care about), but should not be sending a
large number of commands.

Since XCLIENT is for trusted clients only, I don't see a pressing need
for fancy controls or very large attribute counts.

The main cost for us would be new code to fold the Received header given
a variable set of clauses.  Right now we have a fairly static layout IIRC.

-- 
Viktor.


Re: XCLIENT enhancement needed

2021-10-11 Thread Viktor Dukhovni
On Mon, Oct 11, 2021 at 08:10:05AM +, Kai KRETSCHMANN wrote:

> The monitoring rspamd now has no chance to see in the latest Received
> header in the connection was received TLS encrpyted or plain text.

If the goal is to leave a forensic trace, then it may be simpler to add
an optional list of trace key/value pairs to XCLIENT, which the
receiving MTA can choose to add to the message Received header.

https://www.rfc-editor.org/rfc/rfc8314.html#section-7.4

While I'd have preferred a slightly different definition of
these elements, they're probably sufficient for the needs above.

-- 
Viktor.


Re: unix socket group and world read write permissions?

2021-09-28 Thread Viktor Dukhovni
On Tue, Sep 28, 2021 at 08:42:11PM -0400, Jason Pyeron wrote:

> Right - which is why I am asking about using 0666 vs 0600? This is not 
> restrictive.
> 
> In v3.6.2:
> postfix/src/util/unix_listen.c:96:if (fchmod(sock, 0666) < 0)
> postfix/src/util/unix_listen.c:99:if (chmod(addr, 0666) < 0)
> 
> Which OS does postfix not work on if it is restricted to 0600 or 0660 ?

It's best to not go OCD over the socket permissions, they are correct as
they stand.  Some of the setgid commands like postqueue(1) and
postdrop(1) rely on group "x" access to the "public" directory to then
have access to the relevant sockets:

drwx--x---  2 postfix  postdrop  8 Sep 27 13:25 /var/spool/postfix/public

# ls -l /var/spool/postfix/public
total 6
srw-rw-rw-  1 postfix  maildrop  0 Sep 27 13:25 cleanup
srw-rw-rw-  1 postfix  maildrop  0 Sep 27 13:25 flush
srw-rw-rw-  1 postfix  maildrop  0 Sep 27 13:25 pickup
srw-rw-rw-  1 postfix  maildrop  0 Sep 27 13:25 postlog
srw-rw-rw-  1 postfix  maildrop  0 Sep 27 13:25 qmgr
srw-rw-rw-  1 postfix  maildrop  0 Sep 27 13:25 showq

With 0600, users other than "root" or "postfix" can't run "mailq",
or notify the pickup(8) service that there's a new message in the
"maildrop" directory.

Postfix has been running correctly since ~1997 with the socket
permissions as you see them, best to spend time chasing something more
useful.

-- 
Viktor.


Re: [PATCH] A Problem in compat_level_from_string()

2021-06-22 Thread Viktor Dukhovni
On Tue, Jun 22, 2021 at 10:49:49AM -0700, David Bohman wrote:

> You cannot assume that the value returned to 'endptr' is greater than
> 'str' on a valid result. It could be a different string entirely, with
> a lesser pointer value. That is up to the implementation.

Postfix does not pass arguments to strtol() that violate pointer
aliasing.  We can and do correctly assume that strtol() behaves
correctly, and returns a value of 'endptr' that is >= the source
string.

-- 
Viktor.


Re: DB_README: How to specify library path for `libdb-5.3.so`?

2021-04-20 Thread Viktor Dukhovni
On Tue, Apr 20, 2021 at 09:20:40AM -0400, Wietse Venema wrote:

> Paul Menzel:
> > Would you accept a patch to add fix the instructions in `DB_README`?
> 
> I think your problem is that /etc/ld.so.conf needs updating when you 
> install libdb in a nonstandard place.

But one can also augment the default search path with:

AUXLIBS="-Wl,-R,/some/path ..."

When I build Postfix for my FreeBSD server, I use:

make -f Makefile.init dynamicmaps=yes shared=yes pie=yes \
'CCARGS=-DDEF_MAIL_OWNER=\"postfix\" -DDEF_SGID_GROUP=\"maildrop\" 
-DHAS_EAI -DUSE_SASL_AUTH -I/usr/local/include -DHAS_PCRE -DUSE_CYRUS_SASL 
-I/usr/local/include/sasl -DUSE_TLS -I/usr/local/include/db5 -DHAS_CDB 
-DHAS_LMDB' \
'AUXLIBS=-L/usr/local/lib -lsasl2 -lpam -lcrypt 
-Wl,-rpath,/usr/local/lib -fstack-protector-strong  -lssl -lcrypto 
-L/usr/local/lib/db5 -ldb-5.3' \
'AUXLIBS_CDB=-L/usr/local/lib -Wl,-R,/usr/local/lib -lcdb' \
'AUXLIBS_PCRE=-L/usr/local/lib -Wl,-R/usr/local/lib -lpcre' \
'AUXLIBS_LMDB=-L/usr/local/lib -Wl,-R/usr/local/lib -llmdb' \
command_directory=/usr/local/sbin \
config_directory=/usr/local/etc/postfix \
daemon_directory=/usr/local/libexec/postfix \
data_directory=/var/db/postfix \
mailq_path=/usr/local/bin/mailq \   
newaliases_path=/usr/local/bin/newaliases \ 
queue_directory=/var/spool/postfix \
sendmail_path=/usr/local/sbin/sendmail \
shlib_directory=/usr/local/lib/postfix \
html_directory=/usr/local/share/doc/postfix \
manpage_directory=/usr/local/man \
readme_directory=/usr/local/share/doc/postfix \
makefiles

I don't need ".../db5" in the runpath for db-5.3, because the
version-explicit SONAME library is also installed into /usr/local/lib.
Only the bare .so and .a files require /usr/local/lib/db5.  So I use
"/usr/local/lib" instead.

-- 
Viktor.


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-21 Thread Viktor Dukhovni
On Fri, Aug 21, 2020 at 05:38:42PM -0400, Wietse Venema wrote:

> thorsten.hab...@findichgut.net:
> > Any chance to backport the patch to 3.4/3.5?
> 
> This is more change than is allowed in a stable release. Postfix
> 3.6 drops support for OpenSSL < 1.1.1, deletes o(thousand) lines
> of DANE support from the Postfix TLS library, and replaces it with
> o(hundred) lines to use instead the DANE support in OpenSSL.

The backport request was just for the one-liner fix in posttls-finger,
where "-X" no longer falsely conflicts with "-r" (when no "-r" is
in fact specified).  This should/will likely be backported.

-- 
Viktor.


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-21 Thread Viktor Dukhovni
> On Aug 21, 2020, at 5:21 PM, thorsten.hab...@findichgut.net wrote:
> 
> By the way I already applied your last patch on the testing environment.
> No problems found so far. tafile and CApath based mandatory TLS delivery
> work just fine.

Thanks for the confirmation.  Fortunately, the good news is not surprising,
the reason for the intermittent (more failure than success) problem you
were having, and only in tlsproxy(8) is clear from the patch.  The wrong
TLS SSL_CTX was selected for "tafile" connections, it was shared with
normal WebPKI connections which raced the "tafile" connections to set
the correct verification callback.

With the symptoms fitting the bug so well, the confirmation is more of
a formality, but still good to have.   Sorry it took a while to get here,
but the early messages in the thread had me focused on resumption, rather
than the initial verification failure, which was the real problem.

-- 
Viktor.



Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-21 Thread Viktor Dukhovni
On Fri, Aug 21, 2020 at 03:11:50PM -0400, Wietse Venema wrote:

> Viktor Dukhovni:
> > On Fri, Aug 21, 2020 at 10:59:11AM -0400, Wietse Venema wrote:
> > 
> > > > Viktor Dukhovni:
> > > > > - &_DANE_BASED(state->client_start_props->tls_level))
> > > > > + && TLS_DANE_HASTA(state->client_start_props->dane))
> > > > >   msg_warn("%s: DANE requested, but not available",
> > > > >state->client_start_props->namaddr);
> > > 
> > > Should there be a warning when tls_dane_avail() fails AND the
> > > TLS_DANE_BASED is true?
> > 
> > Not needed if TLS_DANE_HASTA is not true, because:
> 
> In that case, can you can suggest a more appropriate warning message?
> The text no longer matches the error condition.

Fair point.  The warning message could/should read:

msg_warn("%s: DANE or local trust anchor based chain"
 " verification requested, but not available",
 state->client_start_props->namaddr);

-- 
Viktor.


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-21 Thread Viktor Dukhovni
On Fri, Aug 21, 2020 at 10:59:11AM -0400, Wietse Venema wrote:

> > Viktor Dukhovni:
> > > - &_DANE_BASED(state->client_start_props->tls_level))
> > > + && TLS_DANE_HASTA(state->client_start_props->dane))
> > >   msg_warn("%s: DANE requested, but not available",
> > >state->client_start_props->namaddr);
> 
> Should there be a warning when tls_dane_avail() fails AND the
> TLS_DANE_BASED is true?

Not needed if TLS_DANE_HASTA is not true, because:

- For a DANE-based policy without DANE-TA TLSA RRs to have
  made it to tlsproxy(8), all the DNS preconditions have
  already been satisfied, and DANE-EE TLSA records have
  been provided to tlsproxy(8).

- In Postfix 2.11–3.5, DANE-EE checks are performed post-handshake.
  Specifically, in particular the DANE-style X.509 verification
  callback is not needed and is not enabled (is set back to
  WebPKI default) on the shared SSL_CTX.

> Would the following be more correct:
> 
>int missing_infrastructure = 0;
> if (!tls_dane_avail()) {  /* mandatory side effects!! */
>   /* True DANE request. */
>   if (TLS_DANE_BASED(state->client_start_props->tls_level)) {
>   msg_warn("%s: DANE requested, but not available",
>state->client_start_props->namaddr);
>   missing_infrastructure = 1;
>   }
>   /* Not DANE, but TA support implicitly dependss on the DANE stack. */
>   else if (TLS_DANE_HASTA(state->client_start_props->dane)) {
>   msg_warn("%s: TA support requested, but DANE is not available",
>state->client_start_props->namaddr);
>   missing_infrastructure = 1;
>   }
> )
> if (missing_infrastructure == 0)
> state->tls_context = tls_client_start(state->client_start_props);

No, because DANE-EE works without tls_dane_avail.  We just check the
certificate fingerprint post-handshake.

> But wait, there is more...
> 
> > >   state->appl_state = tlsp_client_init(state->tls_params,
> > >state->client_init_props,
> > > -   TLS_DANE_BASED(state->client_start_props->tls_level));
> > > +   TLS_DANE_HASTA(state->client_start_props->dane));
> 
> Will this also use the right verify callback function pointer when
> real DANE is requested? Or does real DANE not use those same
> callbacks?

In Postfix 2.11–3.5, the real DANE-EE does not use the custom
DANE-specific X.509 verification callbacks, they're only needed
for DANE-TA verification.

The dichotomy was between WebPKI-style chain verification and DANE-TA
(or "tafile") chain verification.  When all the TLSA records are
DANE-EE, no chain verification is performed, we just do a post-handshake
fingerprint check (supporting both DANE-EE and the "fingerprint"
security level).

All this simplified in Postfix 3.6.  Speaking of which, upgrades from
3.5 to 3.6 need to perform a "reload", to flush the TLS session cache.
Otherwise, some cached sessions returned by the "old" tlsmgr(8) to
new smtp(8) may have the wrong properties.  One way to avoid this is
to include "mail_version" in the session lookup key hash.  We should
probably do that...

-- 
VIktor.


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-21 Thread Viktor Dukhovni
On Fri, Aug 21, 2020 at 10:32:10AM +0300, Thorsten Habich wrote:

> > This is relevant, but probably not 100% accurate, likely some domains
> > also intermittently failed routine CAfile-based validation.
> 
> Thanks for the patch.  There was no higher number of certificate
> verification failures since I updated to Postfix 3.5.4.
> 
> Checking the logs of the last 8 days, there is only one "Server
> certificate not trusted" error for a CApath-based configuration and 348
> errors for TAfile based configurations.
> The number of *CApath* (if that's relevant) based mandatory TLS
> configurations per destination is currently FAR higher than the tafile
> based configurations.

This is expected, given that most destinations are not using "tafile",
the probability of failing a CApath validation during a concurrent
successful "tafile" delivery is low, but not zero.  If that one
validation failure was transient (the destination otherwise verified
before and after) then that could be the one low probability case.

> Hope this patch is suitable:
> 
> --- a/src/posttls-finger/posttls-finger.c   2019-02-12
> 14:17:45.0 +0100
> +++ b/src/posttls-finger/posttls-finger.c.new   2020-08-21
> 09:15:04.256945675 +0200
> @@ -1988,7 +1988,7 @@
>     msg_fatal("bad '-a' option value: %s", state->options.addr_pref);
> 
>  #ifdef USE_TLS
> -    if (state->tlsproxy_mode && state->reconnect)
> +    if (state->tlsproxy_mode && state->reconnect > 0)
>     msg_fatal("The -X and -r options are mutually exclusive");
>  #endif

This has already been fixed in the 3.6 snapshots.  Don't recall whether
that's been backported to 3.4/3.5.

-- 
Viktor.


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-20 Thread Viktor Dukhovni
On Thu, Aug 20, 2020 at 01:20:00PM -0400, Wietse Venema wrote:

> Viktor Dukhovni:
>
> > -   &_DANE_BASED(state->client_start_props->tls_level))
> > +   && TLS_DANE_HASTA(state->client_start_props->dane))
> > @@ -1427,7 +1427,7 @@ static void tlsp_get_request_event(int event, void 
> > *context)
> > - TLS_DANE_BASED(state->client_start_props->tls_level));
> > + TLS_DANE_HASTA(state->client_start_props->dane));
> 
> This looks weird. I thought that the problem was with trust anchors, not DANE?

Yes, the problem is with trust anchors, but DANE is the general case of:

* Policy-based end-entity cert matching:

- DANE "_25._tcp.example.net. IN TLSA 3 ? ? ..." 

- The Postfix "fingerprint" security level

* Policy-based issuer CA cert matching:

- DANE "_25._tcp.example.net. IN TLSA 2 ? ? ..." 

- The Postfix verify/secure levels with a custom per-site
  "tafile" .

Actual DANE TLSA RRsets can have either or both DANE-EE or DANE-TA
records, with verification ultimately matching either or both.  The
"fingerprint" level is mapped to DANE-EE, while "tafile" support is
mapped to DANE-TA.

Thus actual DANE, fingerprint and secure/verify with a "tafile" are all
handled via the "general case" of "some sort of DANE-like policy".

In Postfix 3.6, the job of validating "some sort DANE-like policy" is
entirely delegated to OpenSSL.  You'll be pleased to know, that in
Postfix 3.6 the TLS_DANE_HASTA() and TLS_DANE_HASEE() macros are gone.
We no longer need to treat the various DANE-like matching differently.

-- 
Viktor.


Re: PATCH #2: connection_reuse

2020-08-20 Thread Viktor Dukhovni
On Thu, Aug 20, 2020 at 04:59:49PM +0300, Thorsten Habich wrote:

> > - Do FAILURES happen ONLY after a session is RESUMED.
> 
> Sorry, no. The first connection decides if the problem occurs or not.
> If the session is resumed the error only occurs *if the first
> connection failed*.

Thanks for the answer.  This means that there are no issues recording
the proper validation status in the session cache, and the issue is
entirely validation failure on initial handshake.

I don't recall seeing any logging posted showing those initial
validation failures.  This might be as good a time as any to address
that (the failure logs for the initial connection should have been part
of the post that started this thread).

> If the first connection was successful the error will not appear. The
> status then seem to change in case of a restart (as clarified by Victor
> that clears the session cache) or after I assume
> tlsproxy_tls_session_cache_timeout (default: 3600).
> 
> In the examples I found in our logs, after a failed connection, the
> first successful delivery without a restart was made after 1h + x minutes.

This is of course expected.  With a 1h session cache lifetime, new full
handshakes happen only after the previous saved session has expired.  I
would recommend a shorter session lifetime for now.  It will help to get
a better handle on the problem, by doing the initial handshake more
frequently.

> For sessions which do not get resumed at all the error occurs
> frequently, too.

Yes, that's why you're seeing problems on resumption.

> If I remember correctly the certificate verification with connection
> reuse (so the tlsproxy gets involved) was fixed with:

You keep talking about connection reuse, as though it were somehow
relevant, even though I haven't seen anything in this thread that
suggests that connection reuse is in any involved.  Why do you
believe that connection reuse is a factor in this issue?

I hope you're not still conflating session resumption with connection
reuse.

-- 
Viktor.


Re: PATCH #2: connection_reuse

2020-08-19 Thread Viktor Dukhovni
On Wed, Aug 19, 2020 at 10:52:20AM +0300, Thorsten Habich wrote:

> > > the certificate verification with TA file option still occasionally fails:
> > How is the use of a TA file relevant here?
>
> It only happens with the domains configured with TA file option.

Do *resumed* sessions always fail to validate?  Or is that intermittent?
When resumption fails, was the preceding non-resumed session successful?

Have you considered as a differential diagnostic procedure setting up a separate
transport for the problem domain, and using the trust-anchors in question as
the CAfile for the transport instead of a per-destination policy "tafile"?

Are the trust-anchors self-signed CA certs, or are they "intermediate" certs
signed by some other CA?  If intermediate, it takes a bit more effort to
turn them into a usable CAfile, because they'd need to be encapsulated
as "TRUSTED CERTIFICATE" PEM objects, with a trust EKU of "serverAuth".
I can post an example of how to do that if necessary.

Also, can you test the Postfix 3.6-20200725 snapshot?  In Postfix 3.6
the "tafile" code is based on the DANE support in OpenSSL 1.1.1, rather
than the older DANE certificate validation code in Postfix itself.

-- 
Viktor.


Re: PATCH #2: connection_reuse

2020-08-14 Thread Viktor Dukhovni
On Fri, Aug 14, 2020 at 02:30:03PM +0300, Thorsten Habich wrote:

> the certificate verification with TA file option still occasionally fails:

How is the use of a TA file relevant here?

> 2020-08-13T07:39:39.007186+02:00 server postfix/tlsproxy[47119]:
>   certificate verification failed for remote.domain.tld[10.11.12.13]:25:
>   untrusted issuer /C=PL/O=Unizeto Sp. z o.o./CN=Certum CA

Are you saying that the code doing the validation is unreliable, or is
the remote server merely presenting an unexpected certificate chain?

> 2020-08-13T07:39:39.007423+02:00 server postfix/tlsproxy[47119]:
>   Untrusted TLS connection established to
>   remote.domain.tld[10.11.12.13]:25: TLSv1.2 with
>   cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)
>
> 2020-08-13T07:39:39.007537+02:00 server postfix/smtp[26187]: Untrusted
>   TLS connection established to remote.domain.tld[10.11.12.13]:25: TLSv1.2
>   with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)

Everything working otherwise correctly, so far.

> on the next delivery attempt the connection re-use seem to lead to the
> fact that the verification isn't processed again, although the last
> delivery attempt failed due to a mandatory TLS configuration (secure):

This is NOT "connection reuse".  It is TLS session (server-side state or
client-side ticket) resumption:

> 2020-08-13T07:47:55.233536+02:00 server postfix/tlsproxy[58527]:
>   remote.domain.tld[10.11.12.13]25: re-using session with untrusted
>   certificate, look for details earlier in the log

As plainly noted in the above log entry.

> 2020-08-13T07:47:55.233633+02:00 server postfix/tlsproxy[58527]:
>   Untrusted TLS connection established to
>   remote.domain.tld[10.11.12.13]:25: TLSv1.2 with
>   cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)

And evident from the *new* connection.

> 2020-08-13T07:47:55.233705+02:00 server postfix/smtp[44608]:
>   Untrusted TLS connection established to remote.domain.tld[10.11.12.13]:25: 
> TLSv1.2
>   with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)

Since resumed sessions don't involved presentation of a certificate
chain, the peer authentication status remains unchanged.

> 2020-08-13T07:47:55.235003+02:00 server postfix/smtp[44608]: 123QUEUEID:
>   to=, relay=remote.domain.tld[10.11.12.13]:25,
>   delay=497, delays=496/0/0.12/0, dsn=4.7.5, status=deferred (Server
>   certificate not trusted)

And the client rejects the unauthenticated server.  This condition will
clear once the session in question expires from the TLS session cache.

> In this example the remote side is using a Postfix 3.1. The problem is
> solved after a Postfix restart on our side.

This clears the TLS session cache.

> Another remote MTA which is configured with TA file option, doesn't seem
> to support connection re-use. ("re-using session with untrusted
> certificate, look for details earlier in the log" is not mentioned in
> our logs).

Perhaps you've not configured a TLS session cache.



On Fri, Aug 14, 2020 at 11:22:47AM -0400, Wietse Venema wrote:

> I'll leave it to Viktor and you to figure out why this is
> non-deterministic.

It looks deterministic enough to me.  The only question is whether
Postfix should or should take steps to avoid caching sessions that fail
verification (at security levels > "encryption), or perhaps cache them
for a shorter time than default?  Presently, these are cached in the
same way as regular sessions, and this has been the case for at least a
decade now.

> Unfortunately this does not show whether the SMTP client proceeds
> with the email delivery.

The client does not proceed as seen in one of the above log entries.

-- 
Viktor.


Re: Fairness for single-recipient bulk mail

2020-06-24 Thread Viktor Dukhovni
On Wed, Jun 24, 2020 at 04:43:55PM -0400, Wietse Venema wrote:

> For a given transport and destination, if all three classes have
> work then round-robin interleaving gives each class gets 1/3 of the
> delivery slots, 1/2 if there are only two classes with work, all
> slots if there is only one class witth work, and zero if there is
> no work.
> 
> If implemented in oqmgr, it's brain-dead round-robin with FIFO.
> What could possibly go wrong? :-) Well it does not interleave
> deliveries within the same class. But, with single-recipient messages,
> that wasn't happening anyway, so we are not making things worse.

My instinct is that just "oqmgr" + coarse classes is not quite enough.
There can be multiple large messages in one of the larger classes that
can take a while to drain to a popular destination.  There should
still be some sort of preemption after a particular message group
has consumed enough resources.

Hence the idea of trying to group related single-recipient messages to
synthetically identify a batch of messages that should intermittently
yield to allow other deliveries.

One way to do that is to trim the sender localpart to drop the
recipient extension, but more generally to have a regexp map
that can be configured for that, which might keep only the
domain, or recognize specific expected bulk flows.

All message with the same sender-based key (plus perhaps quantised
size or some other coarse metric) would dynamically consistitude
a qmgr "job", but I don't know how hard it is, or whether it would
be a problem that'll be dynamically updating the job as new messages
arrive, whereas before some of the job info was determined once up
front.

-- 
Viktor.


Re: MTA-STS and Server Name Indication (SNI) on mail servers

2020-06-17 Thread Viktor Dukhovni
On Wed, Jun 17, 2020 at 04:20:00PM -0400, John Levine wrote:

> In article <49nfx174fgzj...@spike.porcupine.org> you write:
> >Postfix will send SNI when it is told (by policy) what servername
> >to use. It can be statically configured as smtp_tls_servername,
> >or dynamically in an smtp_tls_policy_map lookup result with the
> >servername attribute.
> 
> I meant in the other direction -- you can't tell if someone else's
> mail server has more than one name, so the SMTP client should send
> SNI.

There's certainly no point in bothering with unauthenticated TLS
(security level "may" or "encrypt"), since we ignore the certificate in
any case.  With DANE, SNI is sent unconditionally, and with "verify" (or
its conjoined twin "secure") you get to decide by setting
"smtp_tls_servername".

> Looking at the man page it appears that client SNI is tied to DANE
> which is not a great idea since the point of MTA-STS is to do server
> name verification without needing DNSSEC.

You're not reading all the relevant docs.  They may not always be in one
place, sorry about that.

http://www.postfix.org/postconf.5.html#smtp_tls_servername

What that says is that with the *default* empty setting of the parameter
SNI is not sent except with DANE.  When the parameter is NOT empty, and
the security level is not DANE-based, the specified (non-empty) SNI name
is sent.

Therefore, sufficient support for SNI to enable MTA-STS plugins is available:

tls_policy:
example.com secure
servername=hostname
match=mx1.example.com
match=mx2.example.com
match=mx3.example.com

There is no built-in MTA-STS support.

-- 
Viktor.


Re: MTA-STS and Server Name Indication (SNI) on mail servers

2020-06-17 Thread Viktor Dukhovni
On Wed, Jun 17, 2020 at 03:30:09PM -0400, Wietse Venema wrote:

> > Looking at the mail logs for my servers, it's pretty clear that
> > Postfix doesn't send SNI. I would also guess that if a Postfix MTA has
> > multiple names, it doesn't have any way to select a certificate using
> > SNI. This is not hard to fix; I added SNI support to the mailfront
> > SMTP daemon in a couple of hours. It took longer to get all the
> > certificates signed.
> 
> Postfix will send SNI when it is told (by policy) what servername
> to use. It can be statically configured as smtp_tls_servername,
> or dynamically in an smtp_tls_policy_map lookup result with the
> servername attribute.
> 
> There are several MTA-STS plugins for Postfix that provide that
> dynamic policy. It is not built into Postfix at this time, just
> like DKIM and a lot of other protocols.

See also the recent thread on SNI:


http://postfix.1071664.n5.nabble.com/Re-SNI-problem-the-client-side-td106457.html

The Postfix server needs to be:

* 3.4.x >= 3.4.13, or
* 3.5.x >= 3.5.3, or
* 3.6-MMDD >= 3.6-20200610

What Wietse said about the client settings, but see also:


http://postfix.1071664.n5.nabble.com/Re-SNI-problem-the-client-side-tp106457p106468.html

if you're a user of:

https://github.com/Snawoot/postfix-mta-sts-resolver

-- 
Viktor.


Re: connection_reuse

2020-06-17 Thread Viktor Dukhovni
On Wed, Jun 17, 2020 at 06:05:44PM +0300, Thorsten Habich wrote:

> unfortunatelly I ran into a but when trying to use the connection_reuse
> parameter in a TLS policy maps file.
> Attached you can find a patch, to get this option running.

Thanks for the patch, indeed the "continue" is needed.

> --- src/smtp/smtp_tls_policy.c  2018-12-26 20:21:49.0 +0100
> +++ src/smtp/smtp_tls_policy.c.new  2020-06-12 14:44:28.740591359 +0200
> @@ -389,6 +389,7 @@
>  WHERE, name, val);
> INVALID_RETURN(tls->why, site_level);
> }
> +continue;
> }
> msg_warn("%s: invalid attribute name: \"%s\"", WHERE, name);
> INVALID_RETURN(tls->why, site_level);


> P.S.: I think smtp_tls_connection_reuse=yes in combination with tafile
> is broken.

I think you're saying that per-connection trust-anchors are not
supported by the tlsproxy.  That sounds plausible.  The "tafile" is
internally converted to a set of synthetic "DANE-TA(2)" records,
that are used for validation with "secure/verify" instead of the
global CAfile/CApath.  I don't think these are carried along with
the tlsproxy protocol.

There's some internal refactoring I should do in any case that
would make it easier to support these with connection reuse.

Ideally, we could drop (e.g. in Postfix 3.6) for OpenSSL 1.0.x, and
require 1.1.0 or later.  Then it would make sense to refactor to use the
built-in DANE support in OpenSSL (a tidier reworked version of the code
originally in Postfix), and with that make sure that "tafile" works as
expected with TLS connection reuse.

-- 
Viktor.


Re: Lua target

2020-01-14 Thread Viktor Dukhovni
On Tue, Jan 14, 2020 at 02:34:42PM +0100, Thierry Fournier wrote:

> What do you think about delivery target executing natively Lua code ?

I don't see a need for this.

> It does the same thing than “pipe", but much quickly because there
> are no fork/exec and compile/recompile Lua code only at start (or
> if required).

For lower amortised delivery cost you can deliver to a Lua-based
LMTP server.

-- 
Viktor.


Re: patch for replacing the text of postfix built-in reject messages

2020-01-12 Thread Viktor Dukhovni
On Sun, Jan 12, 2020 at 11:09:12AM -0500, Wietse Venema wrote:

> > Glad that you propose to implement this way. However things will
> > be more complicated : should new smtpd_reply_filter_maps and
> > smtpd_reject_footer_maps be executed in sequence ? or be exclusive ?
> 
> I think that for outgoing data, the filter should go last.

My instinct was to suggest that the filter goes first, and then the
footer gets added, but usability rather depends on how the pending
multi-line response is presented to the filter.  If it is easy for the
filter to match and selectively retain the footer, then you're probably
right and the filter should have the final say.  If, on the other hand,
it becomes hard to selectively retain the footer, then I'd be tempted to
add it last.

The "\c" trick no longer works once the footer is combined with the
original text, because (at least in principle) by that point the
original text could have a verbatim "\c" in it, that is not intended to
become a line-break.

If the footer can be assumed to be a single line, then the line breaks
could take the form of LF characters in the input string to the filter,
and the footer could matched via:

/.*\n(.*)\z/
or
/().*(^.*)\z/m  REPLACE ...

Alternatively, if the response that precedes the footer is never
multi-line (perhaps no longer valid once we have milters and/or proxy
filters), then one might match all but the first line.

Otherwise, matching the footer in a filter can become rather tricky.

-- 
Viktor.


Re: New functionality proposal

2020-01-08 Thread Viktor Dukhovni
On Wed, Jan 08, 2020 at 09:36:48AM +0100, Thierry Fournier wrote:

> > - An "smtp_nexthop_override_maps" feature that replaces the domain
> >  in the delivery request with one or more domain names. You decide
> >  the order of names in the result, and if the original domain
> >  should be part of the result. Specify [name], [name] to get
> >  comparable control as with your synthetic MX records.
> 
> I like this idea, but it add a little bit of complexity for
> understanding configuration.

Which can also easily be done in your DNS resolver, including specifying equal
weight or strictly ordered preferences.

> > - Or, add support for multiple next-hop destinations in "relayhost",
> >  "transport_maps", and "default_transport". This changes the syntax
> >  and semantics of existing Postfix features. Again, specify [name],
> >  [name] to get comparable control as with your synthetic MX records.
> 
> I like this idea.

I am not convinced this is warranted.

-- 
Viktor.


Re: [PATCH] dns_lookup: Fix compilation with uClibc-ng

2019-05-03 Thread Viktor Dukhovni
The patch is incorrect/incomplete.  You can't just comment out the
call that does the work.

> On May 3, 2019, at 1:30 AM, Rosen Penev  wrote:
> 
> uClibc-ng does not have res_send or res_nsend.
> ---
> src/dns/dns_lookup.c | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/src/dns/dns_lookup.c b/src/dns/dns_lookup.c
> index 1ea98b3..59657f1 100644
> --- a/src/dns/dns_lookup.c
> +++ b/src/dns/dns_lookup.c
> @@ -344,11 +344,13 @@ static int dns_res_query(const char *name, int class, 
> int type,
>   if (msg_verbose)
>   msg_info("res_mkquery() failed");
>   return (len);
> +#ifndef __UCLIBC__
> } else if ((len = res_send(msg_buf, len, answer, anslen)) < 0) {
>   SET_H_ERRNO(TRY_AGAIN);
>   if (msg_verbose)
>   msg_info("res_send() failed");
>   return (len);
> +#endif
> } else {
>   switch (reply_header->rcode) {
>   case NXDOMAIN:
> -- 
> 2.17.1
> 

-- 
Viktor.



Re: RFE: DANE functions + log

2018-11-19 Thread Viktor Dukhovni
On Mon, Nov 19, 2018 at 11:45:18PM +0100, J. Thomsen wrote:

> >> 1) Postfix
> >>   Later I have found the posttls-finger program in the Postfix 
> >> distribution, but
> >>   the logging in this program should be present in the Postfix smtp itself 
> >> when using the
> >>   smtp_tls_loglevel parameter (and still improvements in the documentation 
> >> are needed)
> >
> >Do you have a specific suggestion of what you'd like to see logged,
> >and a specific log message format?  Would this be additional log
> >entries per connection, or more information in the summary TLS
> >connection log entry?
> 
> I think it would be a large improvement, if just the basic logging of
> posttls-finger -c could be added. Then increasing the smtp_tls_loglevel
> would make things clearer.

That's not terribly specific, what specifically in those logs do
you find compelling and why?

posttls-finger: using DANE RR: _25._tcp.smtp.dukhovni.org 
IN TLSA 3 1 1 
5E:07:8B:31:60:56:9F:16:5A:69:EB:86:03:95:BB:BD:C7:57:6C:36:03:C3:45:2B:07:13:9C:27:6B:26:D0:1C
posttls-finger: smtp.dukhovni.org[100.2.39.101]:25:
depth=0 matched end entity public-key sha256 

digest=5E:07:8B:31:60:56:9F:16:5A:69:EB:86:03:95:BB:BD:C7:57:6C:36:03:C3:45:2B:07:13:9C:27:6B:26:D0:1C
posttls-finger: smtp.dukhovni.org[100.2.39.101]:25
CommonName mournblade.imrryr.org
posttls-finger: smtp.dukhovni.org[100.2.39.101]:25:
subject_CN=mournblade.imrryr.org,
issuer_CN=mournblade.imrryr.org,

fingerprint=D0:29:E8:0C:9D:20:08:F5:47:D8:A8:3A:62:D9:52:A4:E4:8F:A1:64:3E:BD:1E:5E:C6:A3:4C:1E:EB:DB:BB:43,

pkey_fingerprint=5E:07:8B:31:60:56:9F:16:5A:69:EB:86:03:95:BB:BD:C7:57:6C:36:03:C3:45:2B:07:13:9C:27:6B:26:D0:1C
posttls-finger: Verified TLS connection established to 
smtp.dukhovni.org[100.2.39.101]:25:
TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)

I think that 5 log messages where one was looks reasonably sufficient
to me are probably too much.

> When implementing DANE it is helpful to increase the value of 
> smtp_tls_loglevel to at least X.

I've always found level 1 to be sufficient for routine logging.

> Also using posttls-finger included with the source code of Postfix can be 
> recommended.

Various Linux distributions, and FreeBSD do include posttls-finger.

> Postfix is logging various messages at the end of the TLS negotiation:
> 
> Anonymous TLS connection ..
>   This is logged, when ...
> 
> Untrusted TLS connection ...
>   This is logged, when 
> 
> Trusted TLS connection 
>   This is logged, when 
> 
> Verified TLS connection ...
>   This is logged, when 

This is covered in FORWARD_SECRECY_README, as mentioned previously.

> It also makes it possible to handle the key rollover as you suggest, but here 
> it should also be
> noted, that the "3 1 1" format allows a certificate to be renewed without 
> changing the TLSA record,
> as long as the private key is not changed (RFC 6698 A.1.2.2)

https://tools.ietf.org/html/rfc7671#section-5.1
https://tools.ietf.org/html/rfc7671#section-8.1
https://imrryr.org/~viktor/ICANN61-viktor.pdf

-- 
Viktor.


Re: RFE: DANE functions + log

2018-11-19 Thread Viktor Dukhovni
> On Nov 19, 2018, at 7:12 AM, J. Thomsen  wrote:
> 
> 1) Postfix
>   Later I have found the posttls-finger program in the Postfix distribution, 
> but
>   the logging in this program should be present in the Postfix smtp itself 
> when using the
>   smtp_tls_loglevel parameter (and still improvements in the documentation 
> are needed)

Do you have a specific suggestion of what you'd like to see logged,
and a specific log message format?  Would this be additional log
entries per connection, or more information in the summary TLS
connection log entry?

Is there something specific you'd like to see in the documentation?
Are you in a position to contribute documentation?

> 2) BIND
>   It has only a partial implementation of the RFC 4035 AD bit handling. 
>   This, however, must be handled in another forum than this.
> 
>   We have been using BIND since 2001 (DNSSEC since 2013) as an authoritative 
> name server
>   for many domain names as well as a recursive resolver for all PCs on the 
> LAN. 
>   It has been working flawlessly.
>   It is not an option to have to mess around with additional resolvers and 
> name servers just
>   to implement DANE.

Does BIND return an accurate (AA, authoritative answer) bit?  Would
you like to see Postfix consider authoritative answers as implicitly
validated?  If not, what is it that you're asking for?

If BIND can be coaxed into setting the (AD) bit in authoritative replies
to queries that set the (RD) and (AD) bits, that would obviate any need
to implicitly infer (AD) from (AA) in Postfix.

> 3) TLS-SNI
>   Single point of failures occur, when there is only one certificate for all 
> virtual domains
>   on a server and not one for every virtual domain. If the single certificate 
> has expired or is
>   damaged, mail for all customers is locked out from using TLS connections.

As already mentioned, SNI support is under development.  Note
that experience shows that using multiple separate MX hostnames
for the same server makes DANE TLSA more fragile, because now
you have more TLSA records to manage, with increased odds of
neglecting to keep some of them up to date.

My suggestion is to avoid *wildcard* certificates where a single
certificate is used across the entire pool of MX hosts serving
your domains, and is rotated at approximately the same time across
all the hosts.  These *have* been observed to cause outages when
all the MX hosts end up with non-matching TLSA records at the same
time.

Otherwise, by staggering certificate rotation across a small
handful (2 to 4) of MX hosts (each can have multiple IPs or
be in turn a server farm behind a load-balancer), you get
resilience against single points of failure without the complexity
of separate certificates and TLSA records for each domain.

Whatever you do, it won't be reliable if not monitored.  MONITOR
your systems for correctness of TLSA records and near-expiration
of certificates.

Finally, for each logical MX host always publish at least two TLSA
records, bound to *separate* public keys, only one of which becomes
outdated as you rotate your certificates.

The most recommended model is: "3 1 1 + 3 1 1", where one of the
published TLSA records matches the *current* public key of the SMTP
server, and the other matches the *next* public key of the SMTP
server, pre-generated at time of deployment of the current key,
to become the next key when it is time to renew the certificate.

Each renewal cycle:

  * Check that the pending "next" key already has a matching TLSA
record published in the DNS (ideally that this has been the
case for at least 2 DNS TTLs).  If not, delay the key rollover.

  * Obtain a certificate chain for this next key, and generate a
new next key.  Deploy the obtained certificate chain and private
key and either "reload" your servers, or wait a while for all
servers to automatically cut over to the new key.

  * Update your DNS TLSA records to match the new current (previously
next) key as well as the new next key:

Before:   3 1 1  ; current (live)
  3 1 1  ; future  (dormant)

After:3 1 1  ; current (live)
  3 1 1  ; future (dormant)

   * If updating DNS TLSA records requires manual intervention or
 is difficult to integrate as an automated part of the key rollover,
 generate a notification email or other trigger to activate the
 TLSA record update above.  You have plenty of time to do it
 during the lifetime of the current key, but don't procrastinate
 do it early...

-- 
Viktor.



Re: Writing a SMTP Extension

2017-12-26 Thread Viktor Dukhovni
[ Oops, postfix-users was the wrong list apparently, reposting to postfix-devel 
]

> On Dec 26, 2017, at 8:29 PM, Tom Maier  wrote:
> 
> Within my uni project I have to implement additional SMTP commands in
> order to upload or download data (e.g., base64 encoded data). This is
> why my initial idea was to add functionality to the smtpd server of
> Postfix by defining a SMTP service extension. Thus it would be possible
> to provide the "upload" command only for previously authenticated users.
> As opposed to this the "download" command is available for everyone.
> What do you think about this concept?

Why SMTP and not HTTP or IMAP?  What about this application makes it
a natural fir for an asynchronous store and forward protocol?  Remember
that SMTP servers are message routing engines not mail store access
providers.  Postfix does not read users' mailboxes, and often does
not even know where they are (e.g. with LMTP or pipe(8) handoff).

Postfix is NOT Microsoft Exchange, which combines message routing with
mailstore access, for that your closest approximation is Zimbra and the
the like.  Many mailstores include HTTP interfaces, and these can be
used for various collaborative applications.  Message routing is in
steady state stateless, while the MTA keeps track of a working queue
in which messages might persist for some days, nothing stays in the
queue long-term, and once a message is delivered, it is no longer
the MTA's problem.

Bottom line, it sounds rather unlikely that your problem demands an
ESMTP extension.  Those just tweak message handling en-route to a
mailstore, but should not be confused with groupware support.

-- 
Viktor.



Re: RFE: postqueue top sender

2015-10-19 Thread Viktor Dukhovni
On Mon, Oct 19, 2015 at 08:26:21PM -0400, Wietse Venema wrote:

> This internal communication uses Postfix-style records.  For the
> external interface to other programs, a different format would be
> more suitable. I don't know if JSON is still cool these days, but
> that would be a possibility. Transforming the Postfix-style record
> stream into something else should not be difficult.

I think JSON is still cool enough, and enjoys broad support across
multiple languages and sports various good "query" and post-processing
tools.  So I'd go with JSON.

-- 
Viktor.


Re: smtpd_sender_login_maps and multiple lookup tables

2015-10-07 Thread Viktor Dukhovni
On Thu, Oct 08, 2015 at 01:12:15AM +0200, Axel Luttgens wrote:

> I mean, this could be a hint to tweak the algorithm so as to implicitely make 
> use of a "DUNNO" condition;

There no such thing as a "DUNNO" condition.  That's an access
keyword in access(5) maps that short-circuits searches for
less-specific keys.  There is no generally applicable DUNNO.

> something like:
> 
>   for each table
>   sender address not found
>   continue
>   sender address found
>   owner matches
>   return OK
>   else
>   return DUNNO
>   return REJECT

The table in question is not an access table, it returns a list of
logins.  There could a login named "DUNNO".  Furthermore, you're
forgetting the outer loop with the full address, and then various
partial address forms like the address without its extension, ...

> Otherwise, the only unambiguous case for having
> 
>   smtpd_sender_login_maps , , […]
> 
> is when all of the involved maps are disjoint ones (no common sender
> addresses), since ordering of , , [��] would then have no
> meaning.

There's no "ambiguity" table order matters, first match wins.  The
first table overrides the second table, ... Except that partial
keys can make that complicated, if the second table can match
partial keys for which only the full key appears in the first table.

-- 
Viktor.


Re: smtpd_sender_login_maps and multiple lookup tables

2015-10-05 Thread Viktor Dukhovni
On Mon, Oct 05, 2015 at 10:38:04PM +0200, Axel Luttgens wrote:

> >>smtpd_sender_login_maps =
> >>hash:$config_directory/maps/sender_login_exceptions
> >>sqlite:db_sender_login_map
> >> 
> >> user emurphy may send with either sender address "ed.mur...@example.com" 
> >> or sender address "i...@example.com".
> > 
> > No, in this example, the order does not matter.
> 
> Yes, that’s what I thought too.
> But in fact, I *have* to make use of the second ordering (hash then db query) 
> for having the hash table to be taken into account.
> I swear it. ;-)

That's because you're not showing the full contents of the SQL
table.  If the order matters, the SQL table contains a conflicting
key.

-- 
Viktor.


Re: smtpd_sender_login_maps and multiple lookup tables

2015-10-05 Thread Viktor Dukhovni
On Mon, Oct 05, 2015 at 06:39:12PM +0200, Axel Luttgens wrote:

> As usual, I’ll probably appear quite dumb, but I’ll ask anyway. ;-)
> 
> Let’s say I have following data in the database (db_sender_login_map):
> 
>   from_addresslogin
>   =
>   jonh@example.comjdoe
>   ed.mur...@example.com   emurphy
> 
> and this one in the hash (sender_login_exceptions):
> 
>   # from_address  login
>   i...@example.comemurphy
> 
> The idea being to allow authenticated user emurphy to send emails with
> enveloppe sender addresses "ed.mur...@example.com" (the usual case) or
> "i...@example.com" (an exception).

You're confusing keys with values.  Postfix maps a sender to a list
of allowed logins (LHS => RHS).  The lookups are by *sender address*.

> With:
> 
>   smtpd_sender_login_maps =
>   sqlite:db_sender_login_map
>   hash:$config_directory/maps/sender_login_exceptions
> 
> user emurphy may only send with address "ed.mur...@example.com".

No.  Both addresses are valid for emurphy (only).

> With:
> 
>   smtpd_sender_login_maps =
>   hash:$config_directory/maps/sender_login_exceptions
>   sqlite:db_sender_login_map
> 
> user emurphy may send with either sender address "ed.mur...@example.com" or 
> sender address "i...@example.com".

No, in this example, the order does not matter.

-- 
Viktor.


Re: RFE: Additional postqueue output format

2015-09-08 Thread Viktor Dukhovni
On Tue, Sep 08, 2015 at 10:13:30AM +0200, Patrick Ben Koetter wrote:

> At the moment I need to know the top senders in a mail queue with more than 2
> million messages. I'd rather not dig in the logs, but use Postfix internal
> knowledge about messages currently in queue.

A log analyzer that checkpoints its state, and incrementally updates
it from just the new log entries would be far more efficient than
a queue file reader.  Especially if you just need to track "top
senders".

> > But i think you're looking for more detailed stats.
> 
> Nope. I'm looking for a (more) machine readable format.

Wietse's comment about JSON is what I would suggest for a
machine-readable queue content report, but running that report
frequently on a busy queue is not a good idea.  This takes too many
I/O cycles away from useful work.

Efficient queue scans also require keeping state from previous
scans (and use of "long" queue-ids), so that you don't need to
re-read queue-files you've already read (if all you really care
about is the sender anddress and not the number of outstanding
recipients to deliver) or log entries you've already read.

I still recommend using the logs to incrementally update a model
of the queue content (with some heuristics to drop implausibly old
messages whose log entries might have been lost).

-- 
Viktor.


Re: RFE: Additional postqueue output format

2015-09-07 Thread Viktor Dukhovni
On Tue, Sep 08, 2015 at 03:48:25AM +0200, Patrick Ben Koetter wrote:

> $ postqueue -p
> Queue ID- --Size-- ---Arrival Time --Sender/Recipient--
> 3n97rq4vbmz1gT2660 Tue Sep  8 03:18:03 dane-users-boun...@sys4.de
>   (connect to some.server.tld[12.34.56.78]:25: Connection 
> refused)
>lists...@server.tld
> (conversation with some.otherserver.tld[78.90.12.34] timed out while 
> receiving the initial server greeting)
>r...@remote-destination.tld
> 
> I - at least I - find it hard to write scripts that parse this output and
> generate e.g. statistics from it. This is mostly due to the multiline nature 
> of
> the output.

The multi-line format is better IMHO, a message can have a great
many recipients, and addresses can contain internal whitespace.

Have you considered modifying "qshape" to fit your needs?  Or better
yet, surely statistics should be based on logs, not "mailq" output.
What real problem are you trying to solve?

-- 
Viktor.


Re: Postfix 3.0.1 dynamicmaps.c

2015-04-21 Thread Viktor Dukhovni
On Tue, Apr 21, 2015 at 11:02:11AM -0400, Wietse Venema wrote:

 Postfix is the messenger, don't blame the messenger for bad news.
 
  ---8---
  Apr 21 16:41:47 foo7 postfix/qmgr[3538]: scan_dir_push: open
  /etc/postfix/dynamicmaps.cf.d
  Apr 21 16:41:47 foo7 postfix/qmgr[3538]: scan_dir_next: skip .
  Apr 21 16:41:47 foo7 postfix/qmgr[3538]: scan_dir_next: skip ..
  Apr 21 16:41:47 foo7 postfix/qmgr[3538]: warning:
  /etc/postfix/dynamicmaps.cf.d: directory read error: No such file or
  directory
 
 ENOENT after reading '.' and '..' is bogus.

It is not bogus if the directory contains symlinks that point to
non-existent files.  The OP has not yet report what is actually in
/etc/postfix/dynamicmaps.cf.d.  An ls -l for that directory would
be rather useful.

-- 
Viktor.


Re: PATCH: Postfix 3.0.1 dynamicmaps.c

2015-04-21 Thread Viktor Dukhovni
On Wed, Apr 22, 2015 at 01:50:31AM +0200, Matthias Andree wrote:

 I would like to chime in here.  I believe there is a misunderstanding of
 the API, IEEE Std 1003.1, 2013 Edition aka. The Open Group Base
 Specifications Issue 7 for readdir() explicitly state that on
 end-of-directory, errno is not changed, and applications that need to
 check for errors, should zero it beforehand.

I also did not expect readdir() to preemptively clear errno.

The real issue is that NULL returns from readdir(), and many other
functions don't distinguish between normal end of data or no matching
data, and error conditions that lead to the same.  These are API
limitations, and the work-around (where the function actually sets
errno on error) is to clear errno, and test it after.

A similar strategy is needed with strtol() for ERANGE, ...

 There is also normative prose on errno, or errno.h, that no function in
 the respective volume of said standard shall reset errno to 0.

As expected.

-- 
Viktor.


Re: Postfix 3.0.1 dynamicmaps.c

2015-04-21 Thread Viktor Dukhovni
On Tue, Apr 21, 2015 at 06:58:40PM +0300, Mika Ilmaranta wrote:

 diff -up postfix-3.0.1/src/global/dynamicmaps.c.reset-errno
 postfix-3.0.1/src/global/dynamicmaps.c
 --- postfix-3.0.1/src/global/dynamicmaps.c.reset-errno2015-04-21
 18:37:29.641532865 +0300
 +++ postfix-3.0.1/src/global/dynamicmaps.c2015-04-21 18:40:35.594131643
 +0300
 @@ -329,6 +329,12 @@ voiddymap_init(const char *conf_path
  if (access(conf_path_d, R_OK | X_OK) == 0
(dir = scan_dir_open(conf_path_d)) != 0) {
   sub_conf_path = vstring_alloc(100);
 +
 + if(errno != 0)
 + msg_warn(%s: errno set before scan_dir_next while loop, resetting
 now: %m, myname);
 +
 + errno = 0;
 +
   while ((conf_name = scan_dir_next(dir)) != 0) {
   vstring_sprintf(sub_conf_path, %s/%s, conf_path_d, conf_name);
   dymap_read_conf(vstring_str(sub_conf_path), plugin_dir);
 
 
 And I get log:
 
 Apr 21 18:51:11 foo7 postfix/postmap[5099]: warning: dymap_init: errno
 set before scan_dir_next while loop, resetting now: No such file or
 directory

Indeed errno should be reset, if it is to be tested below, but I
don't think that the test is appropriate.  So my patch would be
(perhaps this should also clear errno above the loop subject to
a similar #if 0 ... #endif):

diff --git a/src/global/dynamicmaps.c b/src/global/dynamicmaps.c
index f978460..6391aee 100644
--- a/src/global/dynamicmaps.c
+++ b/src/global/dynamicmaps.c
@@ -333,9 +333,19 @@ voiddymap_init(const char *conf_path, const char 
*plugin_dir)
vstring_sprintf(sub_conf_path, %s/%s, conf_path_d, conf_name);
dymap_read_conf(vstring_str(sub_conf_path), plugin_dir);
}
-   if (errno != 0)
-   /* Don't crash all programs - degrade gracefully. */
+#if 0
+   /*
+* The dynamicmaps.cf.d directory is managed by people, not Postfix
+* software, so we can encounter symlinks to non-existent files there,
+* or race against users deleting files.  Therefore, we ignore ENOENT.
+*
+* Since errors other than ENOENT are fatal in dymap_read_conf(),
+* there's nothing left to report here other than errors in readdir()
+* via scan_dir_next().  That seems to unlikely to bother reporting.
+*/
+   if (errno != 0  errno != ENOENT)
msg_warn(%s: directory read error: %m, conf_path_d);
+#endif
scan_dir_close(dir);
vstring_free(sub_conf_path);
 } else if (errno != ENOENT) {

-- 
Viktor.


Re: Postfix 3.0.1 dynamicmaps.c

2015-04-21 Thread Viktor Dukhovni
On Tue, Apr 21, 2015 at 07:14:37PM +0300, Mika Ilmaranta wrote:

 It's empty and SeLinux context is correct.

 [root@foo7 ~]# ls -la /etc/postfix/dynamicmaps.cf.d/
 total 4
 drwxr-xr-x. 2 root root6 Apr 21 18:46 .
 drwxr-xr-x. 4 root root 4096 Apr 21 18:51 ..

Thanks, so in your case, errno was simple left-over from other
system call.  It is however possible to encounter ENOENT due to
broken symlinks or races against file deletion.  For these it makes
more sense to report the problem in dymap_read_conf() if it is to
be reported at all.

-- 
Viktor.


Re: missing include in allascii.c

2015-02-17 Thread Viktor Dukhovni
On Tue, Feb 17, 2015 at 10:06:34AM +, Eray Aslan wrote:

 --- src/util/allascii.c 2015-02-17 00:43:56.0 +
 +++ src/util//allascii.c  2015-02-17 10:01:47.775727110 +
 @@ -35,6 +35,7 @@
  
  #include sys_defs.h
  #include ctype.h
 +#include string.h
  
  /* Utility library. */
  

Might as well also nit-pick the embedded documentation! :-)

diff --git a/src/util/allascii.c b/src/util/allascii.c
index e2be6b9..6c8ff5f 100644
--- a/src/util/allascii.c
+++ b/src/util/allascii.c
@@ -9,7 +9,7 @@
 /* int allascii(buffer)
 /* const char *buffer;
 /*
-/* int allascii_len(buffer len)
+/* int allascii_len(buffer, len)
 /* const char *buffer;
 /* ssize_t len;
 /* DESCRIPTION
@@ -35,6 +35,7 @@
 
 #include sys_defs.h
 #include ctype.h
+#include string.h
 
 /* Utility library. */
 

-- 
Viktor.


Re: Possible problem with dead code in src/postlog/postlog.c (proposed patch)

2015-02-11 Thread Viktor Dukhovni
On Wed, Feb 11, 2015 at 06:17:13PM -0800, Corey Ashford wrote:

 From our reading of the code, tag can never be 0 there, so that makes the
 then part of the if statement dead code.
 
 After that, there's another if statement (line 254) that will always
 evaluate as true:
 
 if (tag != 0) {
 ...
 
 
 In summary, I believe that removing the tag = 0; line was not the right
 fix for the seg fault bug, but it's not clear to me what the right fix is.

I would:

diff --git a/src/postlog/postlog.c b/src/postlog/postlog.c
index 6384396..3c22180 100644
--- a/src/postlog/postlog.c
+++ b/src/postlog/postlog.c
@@ -170,7 +170,7 @@ MAIL_VERSION_STAMP_DECLARE;
 int main(int argc, char **argv)
 {
 struct stat st;
-char   *slash;
+char   *progname;
 int fd;
 int ch;
 const char *tag;
@@ -200,10 +200,10 @@ int main(int argc, char **argv)
 /*
  * Set up diagnostics.
  */
-if ((slash = strrchr(argv[0], '/')) != 0  slash[1])
-   tag = mail_task(slash + 1);
+if ((progname = strrchr(argv[0], '/')) != 0  progname[1])
+   tag = mail_task(++progname);
 else
-   tag = mail_task(argv[0]);
+   tag = mail_task(progname = argv[0]);
 if (isatty(STDERR_FILENO))
msg_vstream_init(tag, VSTREAM_ERR);
 msg_syslog_init(tag, LOG_PID, LOG_FACILITY);
@@ -216,10 +216,12 @@ int main(int argc, char **argv)
 /*
  * Parse switches.
  */
+tag = 0;
 while ((ch = GETOPT(argc, argv, c:ip:t:v))  0) {
switch (ch) {
default:
-   msg_fatal(usage: %s [-c config_dir] [-i] [-p priority] [-t tag] 
[-v] [text], tag);
+   msg_fatal(usage: %s [-c config_dir] [-i] [-p priority] [-t tag]
+  [-v] [text], progname);
break;
case 'c':
if (setenv(CONF_ENV_PATH, optarg, 1)  0)
@@ -245,12 +247,8 @@ int main(int argc, char **argv)
  * specified with msg_syslog_init();
  */
 mail_conf_read();
-if (tag == 0  strcmp(var_syslog_name, DEF_SYSLOG_NAME) != 0) {
-   if ((slash = strrchr(argv[0], '/')) != 0  slash[1])
-   tag = mail_task(slash + 1);
-   else
-   tag = mail_task(argv[0]);
-}
+if (tag == 0  strcmp(var_syslog_name, DEF_SYSLOG_NAME) != 0)
+   tag = mail_task(progname);
 
 /*
  * Re-initialize the logging, this time with the tag specified in main.cf

-- 
Viktor.


Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-05 Thread Viktor Dukhovni
On Thu, Jun 05, 2014 at 02:24:38PM +0200, Arnt Gulbrandsen wrote:

 But ? is nasty. I have even found two domains that differ only in ?/i, so
 Postfix cannot treat them as equal.

Domains passed to lookup tables and match lists need to be in
a-label form.  The remaining surprises with domains and case-insensitive
comparisons vs. unicode will be with header/body checks, likely OK.

-- 
Viktor.


Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-05 Thread Viktor Dukhovni
On Thu, Jun 05, 2014 at 05:18:48PM +0200, Arnt Gulbrandsen wrote:

 On Thursday, June 5, 2014 4:32:52 PM CEST, Viktor Dukhovni wrote:
 Domains passed to lookup tables and match lists need to be in
 a-label form.
 
 That would make pcre almost impossible and mysql and pgsql lookups rather
 inconvenient.

What's the problem with the canonical representation of the domain exactly
as it appears on the wire in DNS, in certificate DNS altnames, ...

 The a-label form of bl?b?rsyltet?y in a-label form is
 xn--blbrsyltety-y8ao3x. Matching the PCRE /.*syltet?y.*/ in a-label form
 would be inconvenient, perhaps impossible.

Regular expressions on partial DNS labels are not that useful anyway.
Generally one just wants all the sub-domains of a particular domain.
Sometimes one wants to filter cable-modem/DSL PTR records, otherwise
I'm losing sleep over partial DNS label regexps.

 Postgres and Mysql have builtin support for UTF8 strings so mysql/pgsql
 tables can use e.g. the ilike operator, but they do not support strings
 composed from a-labels. Here's a pgqsl concoction to match usernames,
 optionally with subaddresses:

Nothing lost when the domain name is a-label form.  The localpart
remains unicode, and one still needs some sort of UTF-8 - utf-8
lower-case operator that operates correctly on ASCII.  Frankly
applying lowercase() to just the ASCII octets works fine in this
situation, provided the domain is in a-label form already.  Unicode
email address localparts would be case-sensitive in their non-ASCII
octets, not the end of the world.

-- 
Viktor.


Re: RFC: Verify concurrency limit

2014-04-22 Thread Viktor Dukhovni
On Tue, Apr 22, 2014 at 01:50:49PM -0400, Wietse Venema wrote:

 A global limit on the number of pending probes affects only unknown
 email addresses.  Postfix proactively refreshes known email addresses
 well before they expire. I am not an idiot.

Whether this is sufficient depends on the cache hit rate, and
proportion of addresses that receive infrequent mail.  Postfix does
not send refresh probes unless the recipient is actually sent a
message, right?  The OP may benefit from a longer positive cache
lifetime, and a separate transport for probes.  Customers that
tarpit probes are not doing anyone a favour, perhaps cluestick can
be applied.

-- 
Viktor.


Re: Ambiguous description on reject_unknown_recipient_domain

2014-02-13 Thread Viktor Dukhovni
On Fri, Feb 14, 2014 at 01:17:14PM +0800, King Cao wrote:

 *reject_unknown_recipient_domain*Reject the request when Postfix is not
 final destination for the recipient domain, and the RCPT TO domain has 1) *no
 DNS A or MX record* or 2) ...

English is not symbolic logic, but the intent is clear:

1. no (MX or A record)

rather than:

2. no MX or no A record.

By De Morgan's laws (https://en.wikipedia.org/wiki/De_Morgan%27s_laws)
the first is also:

3. no MX and no A record.

interpretation 2 seems too implausible to warrant correcting the
document, but if others feel it is ambiguous and someone sends a
patch for proto/postconf.proto that improves the clarity of the
text, it should be cheap enough to adopt it.

-- 
Viktor.


Re: TLS support

2014-01-10 Thread Viktor Dukhovni
On Fri, Jan 10, 2014 at 11:44:04AM +0100, Patrick Ben Koetter wrote:

 Viktor,
 
 we're lucky to have Carsten Strotmann on our team (here at sys4). You may know
 him for his expertise on DNS. Carsten offered to assist in writing the
 DANE_README.

Thanks.  Very much appreciated.

 I'd like you/others to go over the following TOC to make sure we cover all
 necessary aspects:
 
 - What is DANE
 
 - Benefits of using DANE

Since you're documenting DANE for Postfix, I think it is important
in the above two sections to keep the focus on SMTP Opportunistic
DANE TLS (even though you also will describe the mandatory
dane-only later in the document).

You need to briefly dispel the notion that public CA TLS (aka PKIX,
though PKIX spec also applies to DANE when a trust anchor is used)
is usable without DNSSEC on a large scale for SMTP to MX.  It is
not, because with MX indirection the peer name is insecure sans
secure DNS, and since hop-by-hop security is not implied by the
email address, STARTTLS is is trivial to downgrade.

DANE for SMTP specifically deals with these problems, and along
the way solves Goedel's problem for CA bundles (any list of
CAs is either inconsistent or incomplete, where inconsistent
means too inclusive to be trustworthy).

Also highlight the need for authentication to be reliable, since
MTAs don't have interactive users to click OK (unlike browsers).

Then add quick-dirty link for client setup and server cert
chain / TLSA RR setup (that will be at the bottom of the
document).

 - Prerequisites
 
 - DNSSEC
 - What is DNSSEC?
 - Why does DANE require DNSSEC?
 - a bit of a DNSSEC tutorial...
 
 - Local Resolver
 - DANE will offer only an illusion of security unless the *one and 
 only*
   nameserver in /etc/resolv.conf is 127.0.0.1
 - trust-anchor key rotation
 
 - Certificate Considerations
 
 - Pros/Cons of 2 0 1, ...

Right, I only recommend a choice between 2 0 1 and 3 1 1,
everything else is silly (at least for SMTP), but 2 1 1 could be
an option if the TA contains no useful data beyond its key, and is
in fact an intermediate or even root public CA, which might be
re-issued with new expiration dates, ... but an unchanged key.

 
 - TLSA Key rotation
 
 - DANE setup with Postfix
 
 - Building a TLS certificate
 
 - Basic TLS/SSL configuration
 - Basics + refer to TLS_README for tuning, other policies etc.
 - Testing basic TLS functionality
 
 - Configuring DNS-Based Authentication of Named Entities (DANE)
 - Creating a TLSA DNS resource record
   - tlsagen
   - Refer to Pros/Cons of 2 0 1...
 - Deploying the TLSA DNS RR
 - Testing the TLSA DNS resource record
   - posttls-finger
 
 - Enabling DANE support in Postfix SMTP client
 - Params, Options, Policies
 smtp_dns_support_level = dnssec
 smtp_tls_security_level = dane, dane-only

There are also some DANE related parameters for the
TLS library:

tls_dane_digest_agility = on
tls_dane_digests = sha512 sha256
tls_dane_trust_anchor_digest_enable = yes

 - Verifying DANE
 - LOG
 - Monitoring
 
 - Troubleshooting

- Quick and Dirty configuration

- Client in brief.

DNS and SMTP agent settings.
tls policy table for exceptions:
- non-dane for emergencies (assuming not an MITM attack).
- dane-only

- Server in brief.

Cert chain.
TLSA RR content.
TLSA RR content during key rotation (of EE or TA cert).
 
 - References
 
 
 Thank you

Thanks again.

-- 
Viktor.


Re: TLS support

2014-01-10 Thread Viktor Dukhovni
On Fri, Jan 10, 2014 at 01:52:17PM +, Viktor Dukhovni wrote:

   There are also some DANE related parameters for the
   TLS library:
 
   tls_dane_digest_agility = on
   tls_dane_digests = sha512 sha256
   tls_dane_trust_anchor_digest_enable = yes

Another triplet of new in 2.11 parameters related to DANE:

- smtp_tls_force_insecure_host_tlsa_lookup

This one is not a good to change from the default of no,
until all broken name servers are fixed.  Set to yes on
today's Internet, mail to microsoft.com, nist.gov, ...
would never make it.

- smtp_tls_trust_anchor_file

This one is a bit like fingerprint security but with CAs.  You
can specify a per-site set of trust-anchors (essentially
CAs, but the file can also hold raw public keys).

- tls_wildcard_matches_multiple_labels

This is not really DANE-specific, but allows sites like postini.com
to potentially use DANE with their wild-card certificate which
needs to match multiple labels:

postini.com.  IN  MX  100 postini.com.s8a1.psmtp.com.

posttls-finger: postini.com.s8a2.psmtp.com[64.18.7.11]:25:
Matched subjectAltName: *.psmtp.com
posttls-finger: postini.com.s8a2.psmtp.com[64.18.7.11]:25
CommonName *.psmtp.com

It is less strict than is common for wilcard cert semantics
in applications, and wilcard certs should play little role
in DANE, with sites being able to mint as many or few certs
as they please.  So one might want to try setting this to
no and see whether theis is any negative fallout.

-- 
Viktor.


Re: What causes 550 Action not taken ?

2014-01-05 Thread Viktor Dukhovni
On Mon, Jan 06, 2014 at 03:04:20AM -, John Levine wrote:

 Looking at the logs, I'm seeing a lot of 550 Action not taken at end
 of data from recipient systems which I believe are running Postfix.
 Can someone tell me what that means, so I can tell the recipients to
 undo whatever they did to cause it?

This is not a Postfix response.  Postfix = 2.3 advertises:

250-ENHANCEDSTATUSCODES

and would generally respond with 550 5.X.Y    Not surprisingly,
a search of the Postfix source code finds no such response:

$ git grep Action not taken
$

However, this could be the response from a pre-queue virus scanner
(milter or pre-queue filter proxy) or a remote response to a
recipient verification probe.  Similar thread:

http://postfix.1071664.n5.nabble.com/550-Action-not-taken-td58892.html

-- 
Viktor.


Re: Patch: Support NOTIFY ESMTP parameter in SMFIR_ADDRCPT_PAR

2013-11-23 Thread Viktor Dukhovni
On Sat, Nov 23, 2013 at 10:20:19AM -0800, Andrew Ayer wrote:

 The patch is simple and only touches two functions because most of the
 required pieces were already there.  All I needed to do was split the
 argument list, parse the NOTIFY parameter (using the existing
 dsn_notify_mask() function), and pass the result as the last argument to
 cleanup_addr_bcc_dsn(), instead of always passing DEF_DSN_NOTIFY.  I've
 tried to mimic the existing code style as much as possible.

Simple context-free splitting is in principle not sufficient:

RCPT TO:perverse NOTIFY=bad address@example.com  NOTIFY=good

Though the smtpd(8) parser for RCPT TO may not cover 100% of the
torture-test that is the RFC 5321 RCPT TO or MAIL FROM grammar,
it comes much closer...  Look at extract_addr() in src/smtpd/smtpd.c.

-- 
Viktor.


Re: Patch: Support NOTIFY ESMTP parameter in SMFIR_ADDRCPT_PAR

2013-11-23 Thread Viktor Dukhovni
On Sat, Nov 23, 2013 at 12:28:44PM -0800, Andrew Ayer wrote:

  Simple context-free splitting is in principle not sufficient:
  
  RCPT TO:perverse NOTIFY=bad address@example.com  NOTIFY=good
  
  Though the smtpd(8) parser for RCPT TO may not cover 100% of the
  torture-test that is the RFC 5321 RCPT TO or MAIL FROM grammar,
  it comes much closer...  Look at extract_addr() in src/smtpd/smtpd.c.
 
 Thanks for taking a look at this.
 
 I'm actually only parsing RCPT TO ESMTP parameters here, not an entire
 RCPT TO command, and ESMTP parameter values are not allowed to contain
 space characters[1].  If a parameter value contains an address (e.g.
 ORCPT), then it's encoded using xtext.[2]  So I believe it should be
 quite sufficient to split on space character here.

You're right, I did not look at the patch sufficiently closely.  The
splitting into rcpt, dsn_args happens before your code is reached.

-- 
Viktor.


Re: Separate transport for retried recipients

2013-05-16 Thread Viktor Dukhovni
On Thu, May 16, 2013 at 10:40:38AM +0200, Patrik Rak wrote:

 On 15.5.2013 20:30, Wietse Venema wrote:
 
 Patrik appears to have a source of mail that will never be delivered.
 He does not want to run a huge number of daemons; that is just
 wasteful. Knowing that some mail will never clear the queue, he just
 doesn't want such mail to bog down other deliveries.
 
  From that perspective, the natural solution is to reserve some fraction
 X of resources to the delivery of mail that is likely to be deliverable
 (as a proxy: mail that is new).
 
 Very well said. Describes my thoughts exactly.

What Patrick may not yet appreciate, is that I was advocating
tackling a *related* problem.  I was not claiming that concurrency
balooning (lets give my approach that name) prevents starvation of
new mail under all conditions.

Rather concurrency balooning can:

- More quickly dispose of bursts of slow messages that can congest
  the queue when they first arrive as new mail.  A separate for
  deferred mail does not address this.

- More quickly dispose of bursts of slow messages in the deferred
  path when bad mail is mixed with greylisting, ...

The downside is that new mail is not protected from bursts of
bad mail that fill the baloon.

The Postfix sitting there doing nothing problem is not new, that's
what got me on the list posting comments and patches in June of 2001.

My view then and now is that when an idle system with plenty of
CPU, RAM and networking resources is just sitting there waiting
for a timers to fire, what's wasteful (even if much of the mail is
ultimately discarded) is not using more of the system's resources
to have more timers expiring concurrently.

It is fine if you don't think the related problem is worth addressing,
but at least understand that my perspective is different, I strive
for higher throughput, and then congestion mostly takes care of itself.

I am not completely sold on the 80/20 reservation, since it too
will get blocked with slow mail when a concurrent bunch of slow
mail is new, or when the deferred queue is a mixture of likely
never deliverable, and recently deferred mail.  So the approach is
not perfect either.  Tweaking it to exclude messages that are not
sufficiently old (one maximal backoff time) perhaps addresses most
of my concern about mixed deferred mail, since a sufficiently
delayed message can reasonably tolerate a bit more delay.

 So, if you don't mind, I would like to go ahead and try to implement
 this limit, for both the delivery agent slots as well as the active
 queue slots. I think that enough has been said about this to provide
 evidence that adding such knob doesn't put us in any worse position
 than we are at now, nor does it preclude us from using other
 solutions.

Go ahead.

 The only remaining objection seems to be the amount of back pressure
 postfix applies to incoming mail, depending on the growth of the
 queue. I believe this problem exists regardless of if this new knob
 is in place or not, so it may as well be good idea to discuss this
 independently if you feel like doing so now...

Back-pressure is about the behaviour of 80% of queue (rather than
80% of agents) ceiling and its likely impact is to largely eliminate
inflow_delay (which is already fairly weak).  So the issue is whether
and how to slow down input (smtpd + cleanup) when the queue is large
(as evidenced by a lot of deferred mail in the active queue).

So your work will have an impact on back-pressure (it will further
reduce it), but perhaps since the existing back-pressure is fairly
weak, we can live with it becoming a bit weaker still for now.

The current back-pressure mostly addresses the stupid performance
test case rather than the persistent MTA overload case.  So if we
want to address persistent overload (perhaps as a result of output
collapse, as with a broken second network card) we can design that
separately.  It would perhaps be useful to shed load onto healthier
MX hosts in a cluster in which one MX host is struggling.

-- 
Viktor.


Re: Separate transport for retried recipients

2013-05-16 Thread Viktor Dukhovni
On Thu, May 16, 2013 at 03:47:22PM +, Viktor Dukhovni wrote:

 The Postfix sitting there doing nothing problem is not new, that's
 what got me on the list posting comments and patches in June of 2001.

For the record, it was July.

http://archives.neohapsis.com/archives/postfix/2001-07/0871.html

-- 
Viktor.


Re: Separate transport for retried recipients

2013-05-15 Thread Viktor Dukhovni
On Wed, May 15, 2013 at 06:01:42PM +0200, Patrik Rak wrote:

 Still waiting to hear some reason why what I propose is bad.

The various proposals are largely complementary.  If we restrict
the slow path to 80% of the process limit, that's not too dramatic
a reduction (though slow mail should get more processes if possible,
espeically if this can be done *without* starving fast mail).

I am more concerned with the idea to limit deferred queue scans
when 80% of the active queue is previously deferred, while continuing
to take in new mail exclusively.  This is a not a good idea.

Postfix already exerts too little back-pressure when the queue
fills, ignoring the deferred queue while taking more new mail
quickly will eliminate most of that (when the incoming queue is
not growing, there is no inflow_delay).

So I would definitely NOT cap the deferred queue fraction of the
active queue and favour new mail, unless stiffer back-pressure is
applied to new input (cleanup, ...) upstream.  Yes we should quickly
process what we accepted, but we really should reduce our appetite
for new mail when the queue is already very large.

-- 
Viktor.


Re: Separate transport for retried recipients

2013-05-15 Thread Viktor Dukhovni
On Wed, May 15, 2013 at 06:52:52PM +0200, Patrik Rak wrote:

 I would also like to point out that in my case, the slow mail is
 not a slow mail as in mail which goes to sites behind slow links.
 It is slow as in it takes long time before the delivery agent times
 out.

Clear from the outset.

 Therefore, the 60:1 example is not unrealistic at all - in fact, as
 normal mail delivery gets faster, this ratio easily get's even
 worse, because (and as long as) the timeout remains the same.

My issue with 60:1 is not with the latency ratio, but with the
assumption that there is an unlimited supply of such mail to soak
up as many delivery agents as one may wish to add.  In practice
the input rate of such mail is finite, if the output rate (via high
concurrency) exceeds the input rate, there is no accumulation and
no process exhaustion.

 And that's also why throwing in more delivery agents in this case is
 such a waste - no matter how much I throw in, this mail doesn't get
 delivered, period. That's why I am reluctant to spend any extra
 resources on that.

It is not a waste, each message *will* eventually be allocated a
process and will be tried.  All I want to do is widen the pipe and
deal with congestion quickly!  If you keep the pipe narrow you risk
overflowing the queue capacity.  A wider pipe is useful.  You want
to not starve new mail, we can do both.

In order to drain slow mail quickly, (allocate a buch of sleeping
processes via a bit of memory capacity without thrashing) without
starving new mail we need separate process pools for the slow and
fast path.  Each of which can use the blocked delivery agent process
limit baloon.  Then there is never any contention between the two
flows.

Be careful to not starve the deferred queue without back-pressure
on new mail.  Let new mail find a less-congested MX host.

-- 
Viktor.


Re: Separate transport for retried recipients

2013-05-15 Thread Viktor Dukhovni
On Wed, May 15, 2013 at 12:54:20PM -0400, Wietse Venema wrote:

 Viktor Dukhovni:
  Postfix already exerts too little back-pressure when the queue
  fills,
 
 Agreed.
 
  ignoring the deferred queue while taking more new mail
  quickly will eliminate most of that (when the incoming queue is
 
 You are mis-representing.
 
 There is no intent to IGNORE the deferred queue. After all it is
 allowed to occupy 80% of all the delivery agents! The intent is
 to give it only 80% or whatever. As soon as a deferred message
 clears the queue it is replaced with another one.

Yes, but the effect is the same, the input queue continues to drain
quickly with a substantial reduction in the already light back-pressure,
and the deferred queue grows.

This growth without back-pressure is arguably a feature for a backup
MX host with piles of disk that is willing to queue 5 days of mail
for a dead primary, but in most other cases back-pressure is useful,
to avoid making a bad situation worse.  (I would add a fallback
queue for a large-capacity backup MX, and disable inflow controls
on the fallback input).

We could also handle deferred processing of dead (or temporarily
down) destinations more efficiently, by using a lower *initial*
destination concurency for deferred mail on a per-destination basis.

Turn the initial concurrency of 5 with a cohort count of 1, sideways
to an initial concurrency of 1 and a cohort limit of 5.

On a per-destination basis, the same five messages fail and throttle
the site, but they do so in series, leaving more room for other
concurrent deliveries.  If they don't fail, the concurrency rises.
A destination with some previously deferred mail that is currently
active is treated as though all the mail is high risk, and gets
the adjusted concurrency limits.

-- 
Viktor.


Re: dynamic process limits (Separate transport for retried recipients)

2013-05-14 Thread Viktor Dukhovni
On Tue, May 14, 2013 at 08:24:16AM -0400, Wietse Venema wrote:

 Viktor Dukhovni:
  Nothing I'm proposing creates less opportunity for delivery of new
  mail, rather I'm proposing dynamic (up to a limit) higher concurrency
  that soaks up a bounded amount of high latency traffic (ideally
  all of it most of the time).
 
 This is no better than having a static process limit at that larger
 maximum.  Your on-demand additional process slots cannot prevent
 slow mail from using up all delivery agents.

The difference is that the static larger maximum does not prevent
a thundering hurd of fast deliveries using the high limit to thrash
the network link and process scheduler.

 To prevent slow mail from using up all delivery agents, one needs
 limit the amount of slow mail in the active queue.  Once a message
 is in the active queue the queue manager has no choice. It has to
 be delivered ASAP.

My goal was not preventing congestion under all conditions, this
is simply not possible.  Once some heuristically identified mail
is substantially delayed, we've lost already, since the proposed
heuristics are rather crude.

I am proposing a means of having sustainably higher process limits,
without thrashing.  The higher process limits substantially reduce
steady-state congestion frequency.  As you said, we don't need
perfection.  Simply higher limits are a bit problematic when the
slow path is in fact full of fast mail.

 How do we limit the amount of slow mail in the active queue?

I would prefer to process it at higher concurrency, to the extent
possible, maintaining reasonable throughput even for the plausibly
slow mail, unless our predictors become much more precise.

 That
 requires prediction. We seem to agree that once mail has been
 deferred a few times, it is likey to be deferred again. We have one
 other predictor: the built-in dead-site list. That's it as far as
 I know.

Provided the reason is an unreachable destination, and not a deferred
transport, or a certificate expiration, (any fast repeated deferral
via local policy, ...)

 As for after-the-fact detection, it does not help if a process
 informs the master dynamically that it is blocked.  That is too
 late to prevent slow mail from using up all delivery agents,
 regardless of whether the process limit is dynamically increased
 up to some maximum, or whether it is frozen at that same inflated
 maximum.

The above is a misreading of intent.  It does help, it enables safe
support for higher concurrency levels, which modern hardware and
O/S combinations can easily handle.

 [detailed analysis]
 
 Thanks. This underscores that longer maximal_backoff_time can be
 beneficial, by reducing the number of times that a delayed message
 visits the active queue. This reflects a simple heuristic: once
 mail has been deferred a few times, it is likey to be deferred
 again.

That, plus for many sites a not too aggressively reduced queue
lifetime.  Often an email delayed for more than 1 or 2 days is
effectively too late, with a bounce the sender can resend to a
better address or try another means to reach the recipient.  I
found 2 days to rather than 5 to be largely beneficial with no
complaints of lost mail because some site was down for ~3-4 days.

-- 
Viktor.


Re: Separate transport for retried recipients

2013-05-13 Thread Viktor Dukhovni
On Mon, May 13, 2013 at 06:55:12AM -0400, Wietse Venema wrote:

 Viktor Dukhovni:
  The reasonable response to latency spikes is creating concurrency
  spikes.
 
 By design, Postfix MUST be able to run in a fixed resource budget.
 Your on-demand concurrency spikes break this principle and will
 result in unexpected resource exhaustion.  

No, there are two different process limits one for non-slow deliveries,
that protects exagainst excessive network concurrency, and another for
slow deliveries that protects against memory exhaustion. We can set the
blocked process limit to zero for backwards compatible process ceilings
if you like. That restores legacy behaviour.

 If you want to run more processes increase the process limit. Then,
 Postfix's resource budget can be validated with smtp-sink/source
 like tools.

I want to run more processes (up to a limit) when deliveries are
slow, than when they are not.  Just increasing the process limit
while effective for the slow case, risks too much contention for
the fast case.

 Recall that the automatic response to smtpd overload was NOT running
 more smtpd processes. Instead, the solution was allowing them to
 run with shorter timeouts. That approach respects the fixed budget
 requirement.
 
 Please consider a stress like equivalent for delivery agents.

Is a shorter timeout wise here?  With remote connections we don't
have any control over the connection rate, so we set lower idle
timeouts under stress.

With delivery agents, we have full control of the concurrency, so
there is no need to drop timeouts and we're never idle deliberately,
only blocked on DNS lookups and connection attempts, ...

So raising process concurrency to a higher ceiling is quite ok.
There is little risk of memory issues, each additional smtp(8)
consumes a trivial amount of RAM.  The text pages are shared,
and the data footprint of smtp(8) is low.

Memory pressure based on smtp(8) delivery agent process count isnot
not an issue anymore.  Each process uses O(100KB) of RAM.  We can
run 1,000 delivery agents in 100MB of RAM, which is rather tiny
these days, and the default process limit is still 100, mostly
because 100 parallel outbound streams is reasonable from a network
viewpoint on bandwidth constrained links, which are still common.

If you feel that under stress smtp(8) connection timeouts should
be low, perhaps that's reasonable, but we have no control over DNS
timeouts and short HELO timeouts may be unwise, poorly connectied sites
may never get their mail from a loaded MTA.

In any case, we are at least talking about solving the right problem,
managing concurrency and latency of actual deliveries as they happen.

-- 
Viktor.


Re: Bug in Postfix regarding the 'smtpd_helo_required' option

2013-05-13 Thread Viktor Dukhovni
On Mon, May 13, 2013 at 03:28:09PM +1000, Nikolas Kallis wrote:

 Also, my e-mail address was recently removed from the postfix-devel
 when I did not request it, nor was I consulted over the decision. If
 this happens again then I will cease contributing to Postfix.

You've been removed again, for posting user questions to the devel
list.  Feel free to join the postfix-users list, where it would be
prudent to ask questions before claiming you've found a bug.

-- 
Viktor.


Re: Separate transport for retried recipients

2013-05-13 Thread Viktor Dukhovni
On Mon, May 13, 2013 at 09:10:13AM -0400, Wietse Venema wrote:

  No, there are two different process limits one for non-slow deliveries,
 
 No. It is a mistake to have an overload resource budget that is
 different for different kinds of overload. This is fundamental to
 the design of Postfix. Resources are not just memory but also file
 handles, protocol blocks, process slots, and so on.

This is right in principle, not so much in practice.  Were Postfix
delivery agent concurrency tuned to the limit of local system
resources, indeed one should be careful about overload, but this
too is easy to test, just raise the process limit to the combined
ceiling before testing.

In practice the smtp(8) process limit is far below the system
resource limit, the reason I don't configure 10,000 delivery agents
is not lack of RAM or kernel resources.  My $300 asus laptop  has
4GB of RAM.

Typically it is unwise to run even 1,000 parallel deliveries,
because the network delays would be unfortunate.  However, 1,000
parallel blocked delivery agents are not unreasonable, and I can
test at that load level if I am worried about resource limits.
 
 The overload resource budget must be easy to validate with tools
 like smtp-source/sink and the like: just max out the number of
 connections and verify that things don't come crashing down (I have
 to admit that postscreen(8) complicates this a little; one may have
 to disable its cache temporarily to perform full validation).

Or just by knowing that 1,000 processes is an easy fit.

 Instead of introducing a context-dependent overload resource budget,
 I have a proposal that addresses the real problem (slow or
 non-responding DNS and SMTP servers) and that requires no changes
 to qmgr(8) or master(8), and minor changes to smtp(8).
 
 If we want to address the real problem: slow or non-responding DNS
 and SMTP servers, then we should not waste an entire SMTP client
 process blocking on DNS lookup and TCP connection handshake in the
 first place.  Instead it is more efficient to interpose a prescreen(8)
 process between the qmgr(8) and smtp(8) processes.  This process
 can look up DNS, create the initial TCP connection, peek() at the
 remote server greeting, and keep the bogons away without wasting
 any smtp(8) processes.  Just like postscreen(8) can keep bogus SMTP
 clients away without wasting smtpd(8) processes.

Sadly the smtp(8) delivery agent makes multiple connections, supports
fallback destinations, has SASL and TLS dependent connection cache
re-use barriers, ...  The high latency can happen on a second
connection after a fast 4XX with the first MX host, ...  A prescreen
would be very difficult to implement.

The kernel resources of prescreen would still need to be commensurate
(socket control blocks, ...) with the various smtp(8) processes I
proposed.

Stress dependent timers could be more realistic if we can get DNS
under control, may need a new client library (ldns or similar).  I
am wary of aggressively low client timeouts, we could end up treading
water by timeout out over and over when waiting a bit longer would
get the mail through.

Finally, the original proposal of parallel transports doubles or more
the process concurrency (Patrick would probably tune the slow slow path
with a high process limit).  The same objections apply even more strongly
there, since we may send fast mail down the slow path and stress the system
even more.

All I'm doing is allocating slow path processes on the fly, by
doing it when delivery is actually slow. Think of this as 2 master.cf
entries in one.  You don't object to users adding master.cf entries,
so there's little reason to object to implicit ones.

-- 
Viktor.


Re: Separate transport for retried recipients

2013-05-13 Thread Viktor Dukhovni
On Mon, May 13, 2013 at 05:18:05PM -0400, Wietse Venema wrote:

 The qmgr(8) concurrency scheduler limits the concurrency per nexthop.
 That does not change when prescreen is inserted between qmgr(8) and
 smtp(8) processes.
 
 For each nexthop:
 
 number of qmgr-prescreen connections + number of qmgr-smtp connections 
 = concurrency limit

Patrick does raise a valid new concern about the prescreen design.
Suppose that all ~100 smtp(8) delivery agents are busy, and that
prescreen is willing to accept ~500 simultaneous qmgr(8) delivery
requests in the expectation that for many of these DNS lookups and
or initial connections will incur high latency.

Suppose that instead all 500 DNS lookups and initial connections
complete quickly, giving us 500 parallel connections to some set
of remote servers.  In the mean time the 100 currently busy SMTP
deliveries are taking their time.

We now have a problem, since we we've lots of connections we can't
immediately start using.  And by the time we've capacity to use
them the remote servers may well drop the idle connections.

So while doing pre-emptive DNS lookups is quite safe, doing
pre-emptive connections is more risky.  A similar issue exists
in principle with postscreen, in that more connections might be
all accepted for pass OLD than backend smtpd(8) processes to
serve them, leaving clients stranded for some time.

The impedance mistmatch is less severe with postscreen since so
much mail is spam, and because clients are generally more willing
to wait out delays than servers.

So the prescreen design may not pan out.  And my contention is that
in any case it is a bit pricey it terms of implementation cost to
benefit.

If we limit prescreen to initial DNS lookups, the cost to implement
gets much lower, and much of the initial latency is avoided for
dead sites with broken DNS, so that could be of some use, and we
don't tie up remote resources by prefetching DNS results.  So a
DNS-only pre-screen could be added, still not sure it is worth it.
We'd need lots of data on how much of the latency of dead destinations
is DNS latency vs. connection timeout latency.

-- 
Viktor.


Re: Separate transport for retried recipients

2013-05-13 Thread Viktor Dukhovni
On Mon, May 13, 2013 at 05:25:44PM +0200, Patrik Rak wrote:

 On 13.5.2013 12:55, Wietse Venema wrote:
 Viktor Dukhovni:
 The reasonable response to latency spikes is creating concurrency
 spikes.
 
 By design, Postfix MUST be able to run in a fixed resource budget.
 Your on-demand concurrency spikes break this principle and will
 result in unexpected resource exhaustion.
 
 I'd second Wietse on this one.

And yet you're missing the point.

 If you throw in more resources for everyone, the bad guys are gonna
 claim it sooner or later.  You have to make sure you give it only to
 the good guys, which is the same as giving less to the bad guys in
 the first place. No need to throw in yet more additional resources
 on demand.

We don't know who the good guys are and who the bad guys are.

  - A deferred message may simply be greylisted and may deserve
timely delivery on its 2nd or 3rd (if the second was a bit too
early) delivery attempt.

  - A small burst of fresh messages may be a pile of poop destined
to dead domains, and may immediately clog the queue for 30-300
seconds.

 And that's also why it is important to classify ahead of time, as
 once you give something away, it's hard to take it back.

There is no giving away to maintain throughput, high latency
tasks warrant higher concurrency, such concurrency is cheap since
the delivery agents spend most of their time just sitting there
waiting.

By *moving* the process count from the fast column to the slow
column in real-time (based on actual delivery latency not some
heuristic prediction), we free-up precious slots for fast deliveries,
which are fewer in number.  Nothing I'm proposing creates less
opportunity for delivery of new mail, rather I'm proposing dynamic
(up to a limit) higher concurrency that soaks up a bounded amount
of high latency traffic (ideally all of it most of the time).

To better understand the factors that impract the design we need
to distinguish between burst pressure and steady-state pressure.

When a burst of bad new mail arrives, your proposal takes it through
the fast path which gets congested once (by each message anyway,
but if the burst is large enough, the effect can last quite some
time).  If the mail is simply slow to deliver, but actually leaves
the queue, that's all.  Otherwise the burst gets deferred, and now
gets the slow path, which does not further congest delivery of new
mail, but presumably makes multiple trips through the deferred queue,
causing congestion there each time, amplified if you allocate fewer
processes to the slow than the fast path (I would strongly discourage
that idea).

In any case the fast/slow path fails to completely deal with bursts.

So lets consider steady-state.  Suppose bad mail trickles in as a
fraction 0  b  1 of the total new mail stream, at a rate that
does lead to enough congested fast path processes just from new
mail.  What happens after that?

Well in steady-state, each initially deferred message (which we
for worst-case assume continues to tempfail until it expires) gets
retried N times, where N grows with the maximum queue lifetime and
shrinks with the maximal backoff time (details later).  Therefore,
the rate at which bad messages enter the active queue from the
deferred queue is approximately N * b * new_mail_input_rate.

When is that a problem?  When, N * b  1. Because now a small
trickle of bad new mail becomes a steady stream of retried bad
mail whose volume is N * b higher.  So what can we do to
reduce the impact?

I am proposing raising concurrency for just the bad mail, without
subtracting concurrency for the good mail, thereby avoiding collateral
damage to innocent bystanders (greylisted mail for example).  This
also deals with the initial burst (provided the higher concurrency
for slow mail is high enough to absorb the most common bursts and
low enough to not run out of RAM or kernel resources).  This does
no harm!  It can only help.

You're proposing a separate transport for previously deferred mail,
this can help but also hurt if the concurrency for the slow path
is lower than for the fast path, otherwise it is just a variant of
my proposal, in which we guess who's good and who's bad in advance,
and avoid spillover from the bad processes into the good when the
bad limit is reached.  In both cases total output concurrency should
rise.  Each performs better in some cases and worse in others.

The two are composable, we could have a dedicated transport for
previously deferred mail with a separate process limit for slow
vs. fast mail if we really wanted to get fancy.  We could even
throw in Wietse's prescreen for DNS prefetching, making a further
dent in latency.  All three would be a lot of work of course.

So what have we not looked at yet?  We've not considered trying to
reduce N * b, which amounts to reducing N since b is outside
our control to some degree (though if you can accept less junk,
that's by far the best place to solve the problem, e.g

Re: Separate transport for retried recipients

2013-05-12 Thread Viktor Dukhovni
On Sun, May 12, 2013 at 11:22:22AM +0200, Patrik Rak wrote:

 The fact that qmgr doesn't know how many delivery agents for each
 transport are there doesn't help either. It only knows the
 var_proc_limit, which is not good enough for this. I recall we have
 had a discussion with Wietse long time about this, and IIRC we
 decided that it is better if qmgr doesn't depend on that value at
 that time...

Yes, of course, I covered this in my earlier post, it would need to
be told an upper bound on the number of processes for deferred entries,
leaving the rest for new entries.

smtp_deferred_concurrency_limit = 0 | limit

 I sympathise with the concern about the internal cost, but if the
 solution adds substantial user-visible complexity I contend that
 it is pointless, and the users who need this (the sites that accept
 subscriptions via HTTP, ...) can just create a multi-instance
 config, it is simple enough to do.
 
 Hmm, if the visible configuration is what bothers you, it would be
 equally trivial to implement it so qmgr splits the transport only
 internally, and to the outside world it looks like if there was only
 one transport. But I considered this a worse solution as it would do
 something behind the scenes without allowing to configure it
 properly...

The configuring it properly part raises the complexity cost to a
level where I would suggest that the tiny fraction of sites taking
a high volume of new recipients via HTTP subscription forms can
implement a fallback instance.  The explicit parallel transports
are not much simpler.  A bulk mail MTA probably needs a fallback
instance anyway.

 On Sat, May 11, 2013 at 06:33:22PM -0400, Wietse Venema wrote:
 
 Even simpler: stop reading the deferred queue when more than N% of
 the maximal number of recipient slots is from deferred mail.
 
 This does not address Patrick's stated goal of avoiding process
 saturation in the smtp transport by slow mail to bogus destinations.
 (Similar to my 2001 analysis that motivated relay for inbound mail
 and taught me the importance of recipient validation).
 
 Rather, it adresses active limit exhaustion.  The idea is perhaps
 a good one anyway.  Reserve some fraction of the active queue limits
 for new mail, so that when enough deferred mail is in core, only new
 mail is processed along with the already deferred mail.
 
 I too agree that this one would be really nice to have.

We need to be a bit careful, starving the deferred queue can lead
to an ever growing deferred queue, with more messages coming and
getting deferred and never retried.  If we are to impose a separate
deferred queue ceiling while continuing to take in new mail, we'll
need a much more stiff coupling between the output rate and the
input rate to avoid congested MTAs becoming bottomless pits for
ever more mail.

The current inflow_delay mechanism does not push back hard enough.
When the inflow_delay timer is exhausted, cleanup goes ahead and
accepts the message.  We could consider having cleanup tempfail
when deferred mail hits the ceiling in the active queue.

- Suspend deferred queue scans when we hit a high water mark
  on deferred mail in the active queue.

- Resume deferred queue scans when we hit a low water mark on
  deferred mail in the active queue.

- On queue manager startup generate a set of default process
  limit tokens.

- Generate one token per message moved from incoming into the
  active queue, provided deferred queue scans are not suspended.

- Generate one token per message delivered or bounced (removed
  rather than deferred) when deferred queue scans are suspended.

- Generate another set of default process limit tokens each
  time the queue manager completes a full scan of the incoming
  queue, provided deferred queue scans are not suspended.

- Cleanup (based on request flag from local, bounce, pickup vs.
  smtpd/qmqpd) either ignores inflow_delay (not much point in
  enforcing this with local sources the mail is already in the
  queue) or tempfails after the inflow_delay timer expires.
  With remote sources a full queue, as evidenced by lots of
  deferred mail in the active queue, exerts stiff back-pressure
  on the sending systems.

- We probably need a longer token wait delay if the coupling
  is stiffer.  This would be a new parameter than turns on
  the new behaviour if set non-zero.

This is not yet a complete design, and requires more thought.  We
need to better understand how this behaves when the queue is not
congested and a burst of mail arrives from some source.  We also
need to understand how it behaves when the deferred queue is large
and the input rate is stiffly coupled to the output rate.

Unless the active queue is completely full, we're not coupled to
the output rate, rather we're coupled to the queue-manager's ability
to move mail from incoming into active, with excess tokens acting
a buffer that is refreshed on 

Re: Separate transport for retried recipients

2013-05-12 Thread Viktor Dukhovni
On Sun, May 12, 2013 at 02:52:05PM -0400, Wietse Venema wrote:

 Please consider not hard-coding your two-class solution to new/deferred
 mail only, but allowing one level of indirection so that we can
 insert a many-to-2 mapping from message property (now: from queue
 to delivery class; later: sender, client or size to delivery class).
 
 The idea is that some part of Postfix may clog up due to mail with
 properties other than having failed the first delivery attempt.

Since we're addressing congestion caused by slow mail, perhaps
we're going about it the wrong way.  The heuristic that deferred
(or selected via some other a-priori characteristic) mail is likely
slow is very crude approximation, and may be entirely wrong.

Instead, I think we can apply a specific remedy for the actual
congestion as it happens.

- Enhance the master status protocol to add a new state in addition
  to busy and idle:

* Blocked.  The delivery agent is busy, but blocked waiting to
  complete connection setup (the c time in delays=...).

  The SMTP delivery agent would enter this state at the beginning
  of the delivery attempt, and exit it before calling smtp_xfer(),
  when another session will be attempted to deliver tempfailed
  recipients, the state is re-entered at the completion of
  smtp_xfer().

- Add two companion parameters:

# Typically higher than the process limit by some multiple.
#
default_blocked_process_limit = 500

# Processes blocked for more than this time yield their slot
# to new processes, dynamically inflating the process limit.
#
blocked_process_time = 5s

  When a process stays blocked for more than blocked_process_time,
  master(8) decrements the service busy count and increments
  the service blocked count, provided the maximum blocked count
  has not been reached.  This allows master(8) to create more
  processes to handle mail that is not slow.

  When a delivery agent that has been blocked for more than
  blocked_process_time completes a delivery, it does not
  go back to the accept loop.  Rather it exits.  The process
  start-up cost is ammortized by the long delay.

- The master.cf maxproc column is optionally extended to allow
  setting both the process limit and the blocked process limit.

# service type  private unpriv  chroot  wakeup  maxproc command + args
smtp  unix  -   -   n   -   200/900 smtp

  The syntax is m[/n] where m is the process limit or - for default,
  and n is the blocked process limit or is not specified for default.

This directly addresses process starvation via slow processes, and
does not require any queue manager changes (the queue manager is the
most expensive to support with complex features).

-- 
Viktor.


Re: Postfix and 'smtpd_helo_required'

2013-05-12 Thread Viktor Dukhovni
On Mon, May 13, 2013 at 01:56:05PM +1000, Nikolas Kallis wrote:

 I am someone that won't use a spam prevention method that could
 block a a legitimate e-mail, and as so, my way of fighting spam is
 by protocol-compliance means only.

This won't get you very far at all.  Spam bots are largely protocol
compliant, the only common violation is early talking and fast
timeouts.  So you can use postscreen with a greet pause to catch those.

If you're not willing to use an RBL (zen.spamhaus.org puts you in
very good company with the rest of the planet), be prepared for
a lot of spam.

 I wasn't sure if 'smtpd_helo_required' was suitable as I wasn't sure
 if the SMTP 'HELO' message was a protocol requirement. After reading
 '4.1.1.1  Extended HELLO (EHLO) or HELLO (HELO)' of RFC 2821, I
 learnt that it is a protocol requirement.

Completely safe to use.  Won't block any spam.  You can feel good
about upholding RFC compliance though. :-)

-- 
Viktor.


Re: Separate transport for retried recipients

2013-05-11 Thread Viktor Dukhovni
On Sat, May 11, 2013 at 04:20:51PM +0200, Patrik Rak wrote:

 - What common use case has different per-recipient (not: per-sender,
 etc.) soft reject rates for a mail stream between two sites? Does
 it matter whether some portion of a mail stream between two sites
 is deferred because of the recipient, sender or other cause?
 
 The use case which I am interested in is basically some service
 sending registration confirmation messages to its users, where some
 users decide to fill in bogus addresses which result in temporary
 errors until the message expires and bounces. Such messages tend to
 stock pile in the deferred queue and can quite dominate the active
 queue and adversely affect the deliveries to proper recipients.
 Especially when these bogus recipients are not deferred immediately,
 but only after considerably long timeout.

The only way to deal with high latency is with high concurrency,
thus maintaining a reasonable throughput (concurrency/latency).
Most cases of high latency due to bogus domains, non-responding MX
hosts, ... are cases in which the concurrency at the receiving
system is zero, since no SMTP connection is ever made.  So in this
case you want at least a high process limit for the transport.  If
the bogus destinations are many, then this is enough.

One would need to size the active queue limits for some multiple
of the expected 5-days of bad addresses so that such mail rarely
fills the active queue.  Since Postfix 1.0 was released in 2001,
the price of RAM has fallen considerably.  It is now quite
cost-effective to build servers with 1-4 GB of RAM or more.  So an
MTA with this problem should have a large active queue size to avoid
running out of queue slots.

I think such tuning is a pain in a single instance of Postfix,
and monitoring such a queue is needlessly complex with a single
instance.  I find all the fear and loathing of multiple instances
perplexing.  Multiple instances are *simpler* than intricately
tuned single instances.

 - the concurrency window limit of that alternate transport can be
 explicitly configured to be small, which should minimize the
 difference of the load caused on the target site.

That would be a mistake.  You want a high concurrency, which is
problematic for retries to some legitimate destinations (say Yahoo
after greylisting).  Therefore, what one really wants to know is:

- Did the message fail via a 4XX reply or connection failure?

- Is this the first failure, or has delivery failed multiple times?
  (though with greylisting, one's own retry time may be sooner than
   the receiver's minimum delay).

Thus one may want to keep messages that fail for the fist time or
with a 4XX reply rather than a timeout or connection failure in
the same queue as regular mail, while sending messages that time
out after being deferred into a fallback queue (remote or second
instance).

For this one would need to change the SMTP delivery agent to use a
a conditional fallback relay. This would be added to the delivery
request by the queue manager when processing messages from the
deferred queue, and used by the SMTP delivery agent only when the
last regular MX host site-fails (not 4XX reply).

The effect is to separate slow mail that times out multiple times,
whose delivery could clog the queue, from other mail that is in
the queue briefly, or whose delivery failures are in any case fast
enough to not be a big problem.

 I am all eager to hear what Victor has to say about this one,
 though... He has a lot of experience with problematic sites using
 small concurrency windows, from what I remember...

I don't think that additional transports in the same instance are
a good idea here.  Too much complexity, and still a high risk of
a full active queue.  With a second downstream instance that holds
only the slow mail, one can tune concurrency up, and tune any queue
monitoring more appropriately to the content in hand.  One can also
adjust queue lifetimes more sensibly,  ...

So I propose:

- No changes in trivial-rewrite.

- No additional transport personalities.

- One additional parameter to define a queue-manager signalled
  fallback relay, included with delivery requests for messages
  that come from the deferred queue.

- This fallback relay is ignored by default by all delivery agents, and
  is optionally available in the smtp(8) delivery agent, which needs
  a second non-default parameter to enable its use.

- The second parameter would be set by administrators of affected
  sites in the smtp transport, and likely not set in the relay
  transport.

- Sludge (connection timeout or failure possibly combined with a minimum
  message age) goes to a remote or second instance queue.

The main difficulty is that this meshes somewhat poorly with
defer_transports, since some deferred mail may be innocent and
could be sent to the slow queue when the transport is no longer
deferred if the first delivery 

Re: Separate transport for retried recipients

2013-05-11 Thread Viktor Dukhovni
On Sat, May 11, 2013 at 06:33:22PM -0400, Wietse Venema wrote:

 Patrik Rak:
   This largely solves the problem, and is much simpler to configure:
   
  # Out of a total of $default_process_limit (100), leaving 20
  # for fresh mail.  Adjust appropriately when master.cf or
  # default_process_limit are changed.
  #
  smtp_deferred_concurrency_limit = 80
 
 Even simpler: stop reading the deferred queue when more than N% of
 the maximal number of recipient slots is from deferred mail.

This does not address Patrick's stated goal of avoiding process
saturation in the smtp transport by slow mail to bogus destinations.
(Similar to my 2001 analysis that motivated relay for inbound mail
and taught me the importance of recipient validation).

Rather, it adresses active limit exhaustion.  The idea is perhaps
a good one anyway.  Reserve some fraction of the active queue limits
for new mail, so that when enough deferred mail is in core, only new
mail is processed along with the already deferred mail.

A separate mechanism is still needed to avoid using all ~100 smtp
transport delivery processes for deferred mail.  This means that
Patrick would need to think about whether the existing algorithm
can be extended to take limits on process allocation to deferred
mail into account.

I sympathise with the concern about the internal cost, but if the
solution adds substantial user-visible complexity I contend that
it is pointless, and the users who need this (the sites that accept
subscriptions via HTTP, ...) can just create a multi-instance
config, it is simple enough to do.

-- 
Viktor.


Re: DANE, DNSSEC, GnuTLS, Postfix, Exim

2013-03-31 Thread Viktor Dukhovni
On Sun, Jan 13, 2013 at 07:34:24AM +, Bry8 Star wrote:

 When can we expect a Postfix release, that will support DANE
 protocol ? so that it(postfix) can verify (using DANE  DNSSEC
 protocols) the signed (and free) SSL/TLS certificates(cert) (or
 fingerprints) which we can pre-add in TLSA, (CERT, HASTLS, etc) DNS
 (DNSSEC) records, and then it(postfix) will use those(cert) for
 secure (smtp) communication, and to verify SMTP servers.

If all goes well 2.11, with snapshots incrementally adding support
for DANE in the works.  DANE support is code-complete, but requires
further code review and testing.  To that end I would like to see
more mail receiving sites to sign their DNS zones, and publish TLSA
records for their MX hosts.

Thus far, I'm aware of just six MX hosts in four DNSSEC-enabled
domains that have TLSA records for SMTP.  Of these:

- 5 publish IN TLSA 3 1 1 ... SHA256 public-key digest records,
  which is a best practice, that's exactly what most people should
  publish.

- 1 publishes IN TLSA 3 1 2 which is also fine, since the only
  difference is that the digest uses SHA512.

All six verify.  I'd like to also test:

- 3 0 1 certificate digest RRs.
- 3 1 0 full public key RRs.
- 2 0 0 full certificate trust anchor RRs.
- 2 1 1 public key digest trust anchor RRs.

For IN TLSA 2 1 1 the trust-anchor certificate must appear in
the server's SSL handshake trust chain.  With just the public-key
digest, it is impossible to actually verify the chain.

I don't expect that administrators will publish their trust anchor
in full via DNS, rather they should publish the digest in DNS, and
provide the certificate in the SSL handshake.  (In practice they
should shun all certificate usages other than 3).

If you have a DNSSEC-signed zone and operate MX hosts for that
domain that support inbound STARTTLS, please publish TLSA records,
and let me know which domain has said MX hosts.

 Currently (Jan 12, 2013), the last+stable GnuTLS, now supports DANE,
 (and as of right now, OpenSSL (or any openssl modules) yet does not
 support DANE). Can postfix utilize DANE libraries from gnutls for DANE ?

We don't need a new OpenSSL, its verification callback provides
sufficient rope.

The DANE code in the verification callback is a miniscule portion
of the new code.  Most the hard work went into the SMTP policy
engine that finds, evaluates and caches TLSA RRs, making them
available to the SSL verification callback.  A bunch more effort
goes into making sure that MX record and host to address resolution
tracks the DNSSEC validation status of the results.

Also ~1000 lines of code in the command-line smtptls-finger utility
so one can test before deploying and debug TLS problems if something
goes wrong.

 The DnsSec-Tools.Org site shares PATCH (developed by Sparta) for
 (older) Postfix (and other software) to support DNSSEC, can someone
 expert apply it(patch) on the last+stable postfix ?
 http://www.dnssec-tools.org/howtos/postfix-2.3.x-dnssec-howto.txt

Their patch is largely misguided, no need for explicit DNSSEC
validation in Postfix.  You get that for free by deploying a
validating cache (say unbound) on your machine, and configuring
Postfix to query that cache.  Far better to have DNSSEC validation
code in software focused squarely on DNSSEC support.

Once you do have access to Postfix with DANE support, you'll find
that initially it makes no difference at all, since there are as
yet essentially no SMTP servers that publish DANE records.

Since this is a chicken and egg problem, let's hope I'm laying an
egg. :-)

Perhaps, motivated by an MTA that supports DANE, receiving sites
will start to deploy DNSSEC signed zones and publish TLSA records.
This is going to take a few years, with the early adopters feeling
a bit lonely for a while...

If you're willing to be on the bleeding edge and want to help test
code Wietse has not reviewed yet, you can try on a suitable
non-critical system:

$ git clone http://github.com/vdukhovni/postfix
$ cd postfix/postfix
$ make -f Makefile.init 'CCARGS=-DUSE_TLS' 'AUXLIBS=-lssl -lcrypto' 'OPT=' \
makefiles # Add whatever else you need
$ make
$ make upgrade

The default branch is currently DANE10, it may change in the
future, in which case if you're trying to stay current, you'll need
to checkout DANE11, ... as they become available.

The documentation for DANE is at:

html/TLS_README.html#client_tls_dane
html/TLS_README.html#client_tls_policy

and various parameters linked from these.  The main immediately
usable feature is support for per-destination trust-anchors
(root CA replacements):

html/postconf.5.html#smtp_tls_trust_anchor_file

which you can configure for a set of destinations without waiting
for them to do DNSSEC and TLSA.  Otherwise, until more of you deploy
DNSSEC and publish TLSA records, DANE will have negligible impact
on your outbound mail stream.

-- 
Viktor.


  1   2   >