Re: feature request: caching message arrival time

2019-06-04 Thread Daniel Kahn Gillmor
On Mon 2019-06-03 18:02:53 +0200, Örjan Ekeberg wrote:
> As far as I understand the autocrypt protocol (i.e. not much;-) ), the
> vulnerability is that an incoming message with a later time-stamp than
> the locally saved autocrypt status can update the stored state
> (e.g. turn off encryption).  Manipulating the time-stamp to make the
> message appear to be *older* than it really is should only mean that it is
> less likely to update the saved state?
>
> If this is correct, using the oldest of all the time-stamps seen in the
> Date-header and any of the Received-headers should be the most
> defensive.

It's the most defensive against one form of attack: forging e-mails
intended to update the user's Autocrypt state about a given peer.

But another form of attack is also possible: convincing the user to
*not* update their Autocrypt state about a given peer, while leaving the
original message otherwise plausible and intact, thereby raising no
suspicions about delivery problems.

I'd like notmuch's Autocrypt implementation to try to defend against
either attack where possible.

   --dkg


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: feature request: caching message arrival time

2019-06-04 Thread Daniel Kahn Gillmor
On Mon 2019-06-03 16:02:48 +0200, Ralph Seichter wrote:
> Not meaning to complicate things, but Notmuch does not receive messages
> at all. ;-) One needs to rely on some software to populate the Maildir
> tree (Dovecot LMTP in my case, Postfix or some other MTA for local
> delivery in other cases). Any software transporting the raw messages
> can, and sometimes must, manipulate the header data, and the order in
> which files within the Maildir tree are created is also not determined
> by Notmuch.
>
> As an example: My nightly backup script disables local delivery for the
> duration of the backup process. Once reactivated, delivery of queued
> messages resumes, but it is not guaranteed to happen in the order of
> arrival. So even the local MTA, although trusted, might induce issues in
> terms of delivery time.

I agree with you!  the e-mail system, like any other store-and-forward
ecosystem, offers no guarantees of message delivery.

fwiw, i'm not claiming that the time notmuch receives the message is
guaranteed to be close to the time that the message was sent.

but i can guarantee two things:

 * notmuch cannot receive the message *before* it was sent :)

 * if the local system clock is correct, notmuch can place a plausible
   upper bound on the Date: header that is included in the message.

This alone is useful data.

 --dkg


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: feature request: caching message arrival time

2019-06-03 Thread Örjan Ekeberg
Daniel Kahn Gillmor  writes:

> Sure, assuming that you trust the closest MTA in the chain of MTAs that
> handed the message off to you, since an adversarial proximal MTA could
> manipulate all the existing Received: headers as well.
>
> But I'm a bit uncomfortable with it: this sort of protection actually
> opens up a new attack vector that didn't exist before -- any MTA in the
> chain can now make the message seem like it was actually from the
> *past*, just by setting its own Received: header.

As far as I understand the autocrypt protocol (i.e. not much;-) ), the
vulnerability is that an incoming message with a later time-stamp than
the locally saved autocrypt status can update the stored state
(e.g. turn off encryption).  Manipulating the time-stamp to make the
message appear to be *older* than it really is should only mean that it is
less likely to update the saved state?

If this is correct, using the oldest of all the time-stamps seen in the
Date-header and any of the Received-headers should be the most
defensive.

/Örjan
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: feature request: caching message arrival time

2019-06-03 Thread Ralph Seichter
* Daniel Kahn Gillmor:

> Since notmuch actually knows when it recieved the message [...]

Not meaning to complicate things, but Notmuch does not receive messages
at all. ;-) One needs to rely on some software to populate the Maildir
tree (Dovecot LMTP in my case, Postfix or some other MTA for local
delivery in other cases). Any software transporting the raw messages
can, and sometimes must, manipulate the header data, and the order in
which files within the Maildir tree are created is also not determined
by Notmuch.

As an example: My nightly backup script disables local delivery for the
duration of the backup process. Once reactivated, delivery of queued
messages resumes, but it is not guaranteed to happen in the order of
arrival. So even the local MTA, although trusted, might induce issues in
terms of delivery time.

-Ralph
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: feature request: caching message arrival time

2019-06-03 Thread Daniel Kahn Gillmor
On Mon 2019-06-03 10:57:15 +0200, Örjan Ekeberg wrote:
> Daniel Kahn Gillmor  writes:
>
>> So Autocrypt defines the "effective date" of a message as the *earliest*
>> of two dates: the date that the message is first seen, and the Date:
>> header itself.  So we want our augmented Autocrypt header ingestion
>> routine to search for all other messages we know about from the sender
>> that have both a later firstseen= property *and* a later Date: header.
>
> Would it be possible to use the earliest date seen in any of the
> Received: headers as a safeguard against future-dated messages?

Sure, assuming that you trust the closest MTA in the chain of MTAs that
handed the message off to you, since an adversarial proximal MTA could
manipulate all the existing Received: headers as well.

But I'm a bit uncomfortable with it: this sort of protection actually
opens up a new attack vector that didn't exist before -- any MTA in the
chain can now make the message seem like it was actually from the
*past*, just by setting its own Received: header.

Technically, of course, any MTA could munge the actual Date: header as
well to perform this kind of attack, but that munging would at least
have the potential to be detected by anyone who cares to verify DKIM
headers; but Received: headers are impossible to cover with DKIM.

If there was no expense to the indexing and storage, i'd say it would be
good to just go ahead and index the earliest Received: header as well,
to have that data trivially available as a data point in evaluating
incoming messages.  But since it sounds like there's a cost (in
performance and storage) that would need to be profiled, i don't know
that i can say it's worth the tradeoff.

Since notmuch actually knows when it recieved the message, it seems like
it would be simplest (and less vulnerable to manipulation) to just
record that timestamp directly.

 --dkg


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: feature request: caching message arrival time

2019-06-03 Thread Örjan Ekeberg
Daniel Kahn Gillmor  writes:

> So Autocrypt defines the "effective date" of a message as the *earliest*
> of two dates: the date that the message is first seen, and the Date:
> header itself.  So we want our augmented Autocrypt header ingestion
> routine to search for all other messages we know about from the sender
> that have both a later firstseen= property *and* a later Date: header.

Would it be possible to use the earliest date seen in any of the
Received: headers as a safeguard against future-dated messages?

/Örjan
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: feature request: caching message arrival time

2019-06-01 Thread Daniel Kahn Gillmor
On Sat 2019-06-01 16:19:19 +0200, Ralph Seichter wrote:
> I'm interested. Right now I frankly don't know what knowing when a
> message was first seen by Notmuch might be useful for. That makes it
> a bit difficult for me to contemplate your questions.

Sure, thanks for asking!

As i went to write this down, it became a lot longer than i'd expected.
sorry about that!  On the positive side, i may have convinced myself in
the process that the threat this mechanism would defend against is small
enough that it may not be worth the additional implementation (though if
the implementation were there, we'd certainly want to use it).

So, this is a story about Autocrypt state, out-of-order delivery, and
e-mails with suspicious date stamps ("from the future"). (if you're
reading this message haven't been following Autocrypt closely, you can
read up at https://www.autocrypt.org/)

--

When receiving an e-mail sent From: the peer f...@example.org, an
Autocrypt-capable client needs to update the Autocrypt state for that
peer's e-mail address ("f...@example.org").  This is the case for
messages that have an Autocrypt: header *and* for messages that *don't*
have one.

Both kinds of messages update the Autocrypt peer state, because if you
start receiving Autocrypt-free messages from someone who used Autocrypt
in the past, your client needs to make a note of that and consider it
when it makes its recommendation for new outbound messages to that peer.

Additionally, sometimes we receive e-mail messages out of order.
sometimes this is because we're suddenly running across a cache of old
messages, sometimes it's because we've just popped online after a day
off, and sometimes it's because SMTP had a hiccup (there are probably
many other reasons).

We also probably don't want to store state about everyone who has ever
sent us mail *without* using Autocrypt.  At the moment, at least, that's
probably most senders, and it's both a waste of space and a potential
privacy concern to record a lot of empty state that just indicates that
you got mail from someone at some point in the past.  So if we've never
seen an Autocrypt header from a given peer, there's no state to update.

So now consider the following set of e-mail messages all from the same
sender; mails with a * have an Autocrypt header, and the times
following the message indicates its Date: header in an abstract way
(higher numbers are later than lower numbers).

 A: (time 1)
 B*: (time 2)
 C: (time 3)

Let's assume that i update Autocrypt state about the peer upon receipt
of each message, regardless of what order the messages were sent.  We
want the Autocrypt state to be immutable, independent of the order of
delivery.

If i receive them at times 4, 5, and 6 in order (A, B, C) then i'll
think that the Autocrypt state for the peer is "we had an Autocrypt
header earlier (from B), but a more recent delivery (C) suggests that
they might not be using Autocrypt reliably" (depending on the actual
difference in time between the Date:s of B and C, the peer might end up
with an Autocrypt recommendation called "discourage").  This is the
correct state for us to end up in.

But now imagine that at times 4, 5, and 6 i receive the messages in the
order A, C, B.  If i don't store Autocrypt state for the peer at times 4
and 5, because i've never seen an Autocrypt header for the peer before,
and there is none in messages A and C.  Then my end result is that i'll
think that the Autocrypt state for the peer is just the Autocrypt header
from B.

But that's it's different from what we ended up with when we received
the messages in order.

Now, we can improve on this with the following extra technique: when a
peer goes from no Autocrypt state to having an Autocrypt state, we can
search the existing index for messages from that peer with a later Date:
header.  If we find such a message, then we should include it in our
calculations.  If we do that, then we end up with the correct state,
regardless of the order of delivery.  good!

So far, we haven't needed the firstseen= property yet.  There's one
final wrinkle that introduces the need for it: message Date: headers can
be wrong.  They can even be grossly wrong -- they can be from the
future.  This can happen when the sender's clock is bad, mainly, but it
can also happen through malice (someone wanting to forge a message to
mess with the receipient's state about a given peer, for example).

So Autocrypt defines the "effective date" of a message as the *earliest*
of two dates: the date that the message is first seen, and the Date:
header itself.  So we want our augmented Autocrypt header ingestion
routine to search for all other messages we know about from the sender
that have both a later firstseen= property *and* a later Date: header.

Otherwise, one poorly formed e-mail without an Autocrypt header with the
Date: set to the year 3000 (the "bogus future message") would make it so
that the peer's recommendation would be set to "discourage" when 

Re: feature request: caching message arrival time

2019-06-01 Thread Ralph Seichter
* Daniel Kahn Gillmor:

> I'm working on Autocrypt integration for notmuch right now [...]

Woot! :-)

> I'm happy to explain more about my use case if people are interested
> too.

I'm interested. Right now I frankly don't know what knowing when a
message was first seen by Notmuch might be useful for. That makes it
a bit difficult for me to contemplate your questions.

-Ralph
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: feature request: caching message arrival time

2019-06-01 Thread David Bremner
Daniel Kahn Gillmor  writes:

>  * i don't think we have a way to search properties by range (e.g. the
>way that we can search date ranges).  i don't need that feature for
>my use case, but maybe someone will come up with a use case that
>wants it?  is there a way to store the datestamp in a way that it can
>be scanned the way that "date" can?  or do we already have this and
>i'm just unaware?

you'd need to use a value slot to get (native Xapian) range
searches. To quote the xapian docs

  For performance it is important to keep the amount of data stored
  in the values to a minimum, since the values for a large number of
  documents may be read during the search - the more data that has
  to be read, the slower the search will be.

So it's definitely something that would need to be profiled.

Probably the patches that added lastmod: are a good example for someone
wanting to investigate this.


___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch