Re: Fixed Message-ID trouble

2023-09-27 Thread Teemu Likonen
* 2023-09-27 13:48:50-0300, David Bremner wrote:

> By the way, if using the emacs front-end did you try the unthreaded
> view (U)? That would at least mitigate damage from people replying to
> the poisoned messages.

I didn't. So thanks for reminding about the unthreaded view. It is a
nice fallback mode when threading is broken or complicated. Plain list
of timestamp-sorted messages help in this particular case because the
originally different threads (which are now the same thread) appeared in
different times.

-- 
/// Teemu Likonen - .-.. https://www.iki.fi/tlikonen/
// OpenPGP: 6965F03973F0D4CA22B9410F0F2CAE0E07608462


signature.asc
Description: PGP signature
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-27 Thread David Bremner
Teemu Likonen  writes:

> Some person on debian-user mailing list seems to be sending messages
> with fixed Message-ID field: the same ID in different messages. In
> Notmuch it is creating trouble because it connects unrelated threads to
> one. The person has different messages in different threads but Notmuch
> thinks they are the same message because the Message-ID is the same.
>
> This is potentially a "denial of service" for Notmuch. Well, not quite,
> but is harmful nonetheless. How would a Notmuch user fix the mess or
> protect himself against it?

By the way, if using the emacs front-end did you try the unthreaded view
(U)? That would at least mitigate damage from people replying to the
poisoned messages.

I could imagine a future version of notmuch considering the
identification of files with the same message id as part of "threading",
and allowing an unthreaded view to just show all the files, effectively
ignoring the message-id. The next step would be to do that selectively
for some messages.  This all requires a complete redesign of the
database schema, so I don't know how realistic it is.

d
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-26 Thread David Bremner
Teemu Likonen  writes:

> Will Notmuch also break the thread so that this edited message will
> start a new thread? Maybe the message itself but its follow-ups need to
> be fixed too. Often "References" points several earlier messages in the
> chain. So, to detach a subthread from bigger thread would need manual
> editing for more than one message:

Yeah, once people start replying to the broken messages, it becomes more
complicated, as you point out.
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-26 Thread Alexander Adolf
Andreas Kähäri  writes:

> [...]
>> > stupid "external message" headers added by malicious^Wcorporate mail
>> > servers, etc...
>> 
>> Headers would not "muddy the waters" since they are headers. In my mind,
>> the hash would be over the body only.
>
> Hi, I'm not really part of the discussion, but I can add a quick thought
> and a suggestion.
>
> There are corporate mail servers that add a boilerplate "header" to the
> body of outgoing email messages.  The more common practice is to add a
> "footer" to the message.  I have seen these footers being added both
> before and after the user's signature.  You can not use a hash that
> contains the body of the message to identify the message as unique.

Thanks for pointing out. You're right, of course; I have seen such
things myself, too.

It thus seems to me that the body hash idea is officially not working. I
rest my case.

> Using the earliest Received header (the one furtherst down) as a unique
> identifier would possibly be a better approach.  Since this likely
> contains the identity of the originating mail server, some mail queue
> ID, and a timestamp, it should be unique enough to identify the message,
> even if the message is received via multiple routes and has a non-unique
> Message ID.
> [...]

I would strongly advise against using any "early" Received (or any
other) header for any heuristics. In spam traffic most headers will all
but certainly be fake. The only ones to trust is the very last Received
header added by your own (or your provider's) mail system.

Trying to control your code's behaviour based on maliciously crafted
data would hence mean intentionally exposing an attack surface. Parsing
these data for display to the user (as is the case now) is as far as I
would suggest going with that; but no further.


Cheers,

  --alexander
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-26 Thread Daniel Kahn Gillmor
On Mon 2023-09-25 11:54:07 +0300, Teemu Likonen wrote:
> Some person on debian-user mailing list seems to be sending messages
> with fixed Message-ID field: the same ID in different messages. In
> Notmuch it is creating trouble because it connects unrelated threads to
> one. The person has different messages in different threads but Notmuch
> thinks they are the same message because the Message-ID is the same.
>
> This is potentially a "denial of service" for Notmuch. Well, not quite,
> but is harmful nonetheless. How would a Notmuch user fix the mess or
> protect himself against it?

fwiw, the duplicate message-id attack vector a long-recognized problem:

  
https://nmbug.notmuchmail.org/nmweb/show/87k42vrqve.fsf%40pip.fifthhorseman.net

yikes, over a decade ago ☹

With recent versions of notmuch, if the problem is a message-id
collision, you can at least *see* the different variant forms of a given
message by cycling through the list of duplicates (e.g. via
notmuch-show-choose-duplicate in notmuch-emacs), thanks to excellent
work by David Bremner:

https://nmbug.notmuchmail.org/nmweb/show/20220701214548.461943-1-david%40tethera.net

As for thread splitting/re-joining based on References: and In-Reply-To:
headers, you might be interested in these oldies-but-goodies from the
mailing list archives, which as far as i know we have never managed to
resolve:

https://nmbug.notmuchmail.org/nmweb/show/AANLkTimDjk_-Xjpf6uovGXgyG_3j-ySLWQR%2B0UvdVjjT%40mail.gmail.com
https://nmbug.notmuchmail.org/nmweb/show/87mvp9uwi4.fsf%40alice.fifthhorseman.net

Sorry to only have archival references here and not robust/complete
fixes.

--dkg


signature.asc
Description: PGP signature
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-26 Thread Teemu Likonen
* 2023-09-26 07:07:46-0300, David Bremner wrote:

> Teemu Likonen  writes:
>> Perhaps my wish is that there was an easy way to break threads: mark a
>> message as origin of a new thread.

> How about if you delete the Message-ID, References, and In-Reply-To
> headers from the bad messages and re-index? Notmuch will synthesize a
> unique Message-Id if there is none present.

Will Notmuch also break the thread so that this edited message will
start a new thread? Maybe the message itself but its follow-ups need to
be fixed too. Often "References" points several earlier messages in the
chain. So, to detach a subthread from bigger thread would need manual
editing for more than one message:

 1. Edit one message and remove its "References" and "In-Reply-To".
Possibly edit "Message-ID". This would be the origin of a new
thread.

 2. Check all follow-ups to that message and make them refer the new
origin and its (possibly) new "Message-ID". Remove references that
go beyond the origin.

 3. Reindex.

Or just forget the mess and move on with life. :-)

-- 
/// Teemu Likonen - .-.. https://www.iki.fi/tlikonen/
// OpenPGP: 6965F03973F0D4CA22B9410F0F2CAE0E07608462


signature.asc
Description: PGP signature
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-26 Thread Andreas Kähäri
On Tue, Sep 26, 2023 at 01:44:00PM +0200, Alexander Adolf wrote:
> David Bremner  writes:
> 
> > Alexander Adolf  writes:
> >
> >> Bearing in mind that re-recognising a message which has arrived
> >> multiple times via different routes is a worthwhile feature, it would
> >> seem to me that a hash over the invariant part of the message, that is
> >> the body, would allow for such detection. In that light, it would seem
> >> to me that the tuple (body_hash, message_id) could be a candidate for
> >> a “unique enough”(tm) identifier?
> >
> > I always had the impression that the message body had too variation
> > imposed by different delivery routes for this to be very helpful:
> > essentially the hash would be different for every file due to trailers
> > added by mailing lists,
> 
> Ah, good point. I hadn't thought of mailing list trailers. Could these
> perhaps be detected via the signature line separator "-- \n"?
> 
> I guess this also touches on the question of what a consensus definition
> of "sameness" could be. If we take the message-id only, it'd be a purely
> technical one. If we'd include the content one way or another (for
> instance via hash over the body), that would rather be an editorial
> definition of "sameness".
> 
> > re-encoding,
> 
> Like...? utf-8 to/from quoted-printable...?
> 
> > stupid "external message" headers added by malicious^Wcorporate mail
> > servers, etc...
> 
> Headers would not "muddy the waters" since they are headers. In my mind,
> the hash would be over the body only.

Hi, I'm not really part of the discussion, but I can add a quick thought
and a suggestion.

There are corporate mail servers that add a boilerplate "header" to the
body of outgoing email messages.  The more common practice is to add a
"footer" to the message.  I have seen these footers being added both
before and after the user's signature.  You can not use a hash that
contains the body of the message to identify the message as unique.

Using the earliest Received header (the one furtherst down) as a unique
identifier would possibly be a better approach.  Since this likely
contains the identity of the originating mail server, some mail queue
ID, and a timestamp, it should be unique enough to identify the message,
even if the message is received via multiple routes and has a non-unique
Message ID.

> > I could be wrong, maybe hashing is a useful approach, but I'd need to
> > see some numbers to be convinced.
> 
> I fully agree that we need to adapt to the realities of how things are
> actually used, not how they were intended to be used.
> 
> How would I find instances of multiple files for the same message-id in
> my database for example?
> 
> 
> Cheers,
> 
>   --alexander
> ___
> notmuch mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

-- 
Andreas (Kusalananda) Kähäri
Uppsala, Sweden

.
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-26 Thread Alexander Adolf
David Bremner  writes:

> Alexander Adolf  writes:
>
>> Bearing in mind that re-recognising a message which has arrived
>> multiple times via different routes is a worthwhile feature, it would
>> seem to me that a hash over the invariant part of the message, that is
>> the body, would allow for such detection. In that light, it would seem
>> to me that the tuple (body_hash, message_id) could be a candidate for
>> a “unique enough”(tm) identifier?
>
> I always had the impression that the message body had too variation
> imposed by different delivery routes for this to be very helpful:
> essentially the hash would be different for every file due to trailers
> added by mailing lists,

Ah, good point. I hadn't thought of mailing list trailers. Could these
perhaps be detected via the signature line separator "-- \n"?

I guess this also touches on the question of what a consensus definition
of "sameness" could be. If we take the message-id only, it'd be a purely
technical one. If we'd include the content one way or another (for
instance via hash over the body), that would rather be an editorial
definition of "sameness".

> re-encoding,

Like...? utf-8 to/from quoted-printable...?

> stupid "external message" headers added by malicious^Wcorporate mail
> servers, etc...

Headers would not "muddy the waters" since they are headers. In my mind,
the hash would be over the body only.

> I could be wrong, maybe hashing is a useful approach, but I'd need to
> see some numbers to be convinced.

I fully agree that we need to adapt to the realities of how things are
actually used, not how they were intended to be used.

How would I find instances of multiple files for the same message-id in
my database for example?


Cheers,

  --alexander
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-26 Thread David Bremner
Alexander Adolf  writes:

>
> Bearing in mind that re-recognising a message which has arrived
> multiple times via different routes is a worthwhile feature, it would
> seem to me that a hash over the invariant part of the message, that is
> the body, would allow for such detection. In that light, it would seem
> to me that the tuple (body_hash, message_id) could be a candidate for
> a “unique enough”(tm) identifier?

I always had the impression that the message body had too variation
imposed by different delivery routes for this to be very helpful:
essentially the hash would be different for every file due to trailers
added by mailing lists, re-encoding, stupid "external message" headers
added by malicious^Wcorporate mail servers, etc...

I could be wrong, maybe hashing is a useful approach, but I'd need to
see some numbers to be convinced.
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-26 Thread David Bremner
Teemu Likonen  writes:

> * 2023-09-25 07:33:23-0400, Daniel Corbe wrote:
>
>> Silly question, I know, but have you actually tried reaching out to
>> the user?
>
> Not silly, but I don't even know who the person is. All I see is the
> mess, and everything else is my interpretation of the cause. Notmuch
> Emacs tree mode shows messages' relations but they are not accurate if
> references are messed up. It's difficult to dig into Message-ID level of
> relations.
>
> Perhaps my wish is that there was an easy way to break threads: mark a
> message as origin of a new thread. Or perhaps I just use my custom
> ignore mechanism to mark messed threads automatically as read and move
> on.

How about if you delete the Message-ID, References, and In-Reply-To
headers from the bad messages and re-index? Notmuch will synthesize a
unique Message-Id if there is none present.
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-25 Thread Andy Smith
Hi,

On Mon, Sep 25, 2023 at 11:53:34PM +0200, Gregor Zattler wrote:
> Hi Teemu, notmuch users,
> * Teemu Likonen  [2023-09-25; 11:54 +03]:
> > Some person on debian-user mailing list seems to be sending messages
> > with fixed Message-ID field: the same ID in different messages.

[…]

> would you please give details of some such posts?  Then
> other people are able to investigate.

Here's an explainer for confused people on the debian-user list:

https://lists.debian.org/debian-user/2023/09/msg00515.html

Here's an mbox of the five messages that dsr sent that have a
different message ID format to their other messages, and show two
duplicate IDs:

https://strugglers.net/~andy/dsr.mbox

$ grep '^Message-ID' ~/public_html/dsr.mbox
Message-ID: <[email protected]>
Message-ID: <[email protected]>
Message-ID: <[email protected]>
Message-ID: <[email protected]>
Message-ID: <[email protected]>

dsr is now aware of the problem and says they have fixed it.

Cheers,
Andy
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-25 Thread Gregor Zattler
Hi Teemu, notmuch users,
* Teemu Likonen  [2023-09-25; 11:54 +03]:
> Some person on debian-user mailing list seems to be sending messages
> with fixed Message-ID field: the same ID in different messages. In
> Notmuch it is creating trouble because it connects unrelated threads to
> one. The person has different messages in different threads but Notmuch
> thinks they are the same message because the Message-ID is the same.
>
> This is potentially a "denial of service" for Notmuch. Well, not quite,
> but is harmful nonetheless. How would a Notmuch user fix the mess or
> protect himself against it?

would you please give details of some such posts?  Then
other people are able to investigate.

Ciao; Gregor
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-25 Thread Alexander Adolf
Hello, 

This sounds like a nasty problem indeed. OTOH, “there’s nothing that couldn’t 
be” as my granny would have put it. 

Bearing in mind that re-recognising a message which has arrived multiple times 
via different routes is a worthwhile feature, it would seem to me that a hash 
over the invariant part of the message, that is the body, would allow for such 
detection. In that light, it would seem to me that the tuple (body_hash, 
message_id) could be a candidate for a “unique enough”(tm) identifier?

  --alex

-- 
www.condition-alpha.com / @c_alpha
Sent from my iPhone; apologies for brevity and autocorrect weirdness. 

> On 25. Sep 2023, at 14:00, Teemu Likonen  wrote:
> 
> * 2023-09-25 07:33:23-0400, Daniel Corbe wrote:
> 
>> Silly question, I know, but have you actually tried reaching out to
>> the user?
> 
> Not silly, but I don't even know who the person is. All I see is the
> mess, and everything else is my interpretation of the cause. Notmuch
> Emacs tree mode shows messages' relations but they are not accurate if
> references are messed up. It's difficult to dig into Message-ID level of
> relations.
> 
> Perhaps my wish is that there was an easy way to break threads: mark a
> message as origin of a new thread. Or perhaps I just use my custom
> ignore mechanism to mark messed threads automatically as read and move
> on.
> 
> -- 
> /// Teemu Likonen - .-.. https://www.iki.fi/tlikonen/
> // OpenPGP: 6965F03973F0D4CA22B9410F0F2CAE0E07608462
> ___
> notmuch mailing list -- [email protected]
> To unsubscribe send an email to [email protected]


signature.asc
Description: Binary data


smime.p7s
Description: S/MIME cryptographic signature
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-25 Thread Teemu Likonen
* 2023-09-25 07:33:23-0400, Daniel Corbe wrote:

> Silly question, I know, but have you actually tried reaching out to
> the user?

Not silly, but I don't even know who the person is. All I see is the
mess, and everything else is my interpretation of the cause. Notmuch
Emacs tree mode shows messages' relations but they are not accurate if
references are messed up. It's difficult to dig into Message-ID level of
relations.

Perhaps my wish is that there was an easy way to break threads: mark a
message as origin of a new thread. Or perhaps I just use my custom
ignore mechanism to mark messed threads automatically as read and move
on.

-- 
/// Teemu Likonen - .-.. https://www.iki.fi/tlikonen/
// OpenPGP: 6965F03973F0D4CA22B9410F0F2CAE0E07608462


signature.asc
Description: PGP signature
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-25 Thread Daniel Corbe

> On Sep 25, 2023, at 06:52, Teemu Likonen  wrote:
> 
>> Some person on debian-user mailing list seems to be sending messages
>> with fixed Message-ID field: the same ID in different messages. In
>> Notmuch it is creating trouble because it connects unrelated threads to
>> one. The person has different messages in different threads but Notmuch
>> thinks they are the same message because the Message-ID is the same.
>> 
>> This is potentially a "denial of service" for Notmuch. Well, not quite,
>> but is harmful nonetheless. How would a Notmuch user fix the mess or
>> protect himself against it?
> 
> I am no longer sure if this issue is caused by fixed "Message-ID" or
> wrong "References" or "In-Reply-To" values. Anyway, someone has created
> real mess anyway because Notmuch combines originally separate threads
> now and forever.

Silly question, I know, but have you actually tried reaching out to the user?  
No MUA that I’m aware of acts like this and it’s pretty clear from 
documentation and standards tracks that Message-ID is meant to be globally 
unique per message.

If the user is knowledgeable enough to have a boutique mail reader, they’re 
probably also knowledgeable enough to correct the defect too.


signature.asc
Description: Message signed with OpenPGP
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-25 Thread Michael J Gruber
Am Mo., 25. Sept. 2023 um 12:53 Uhr schrieb Teemu Likonen :
>
> * 2023-09-25 11:54:07+0300, Teemu Likonen wrote:
>
> > Some person on debian-user mailing list seems to be sending messages
> > with fixed Message-ID field: the same ID in different messages. In
> > Notmuch it is creating trouble because it connects unrelated threads to
> > one. The person has different messages in different threads but Notmuch
> > thinks they are the same message because the Message-ID is the same.
> >
> > This is potentially a "denial of service" for Notmuch. Well, not quite,
> > but is harmful nonetheless. How would a Notmuch user fix the mess or
> > protect himself against it?
>
> I am no longer sure if this issue is caused by fixed "Message-ID" or
> wrong "References" or "In-Reply-To" values. Anyway, someone has created
> real mess anyway because Notmuch combines originally separate threads
> now and forever.

Yes, several sources of different badness ...

Still, if I understand correctly, a new message with a pre-existing
mid ends up being registered by notmuch as a second file for the
"same" message irrespective of differences in the actual files. For
message copies which you receive via different paths (say directly
plus via an ml) this may or may not be what you want. Used
intentionally, it may create harm - how do other mailers handle this?
Show them in parallel in the same thread (but as individual messages)?

Michael
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Re: Fixed Message-ID trouble

2023-09-25 Thread Teemu Likonen
* 2023-09-25 11:54:07+0300, Teemu Likonen wrote:

> Some person on debian-user mailing list seems to be sending messages
> with fixed Message-ID field: the same ID in different messages. In
> Notmuch it is creating trouble because it connects unrelated threads to
> one. The person has different messages in different threads but Notmuch
> thinks they are the same message because the Message-ID is the same.
>
> This is potentially a "denial of service" for Notmuch. Well, not quite,
> but is harmful nonetheless. How would a Notmuch user fix the mess or
> protect himself against it?

I am no longer sure if this issue is caused by fixed "Message-ID" or
wrong "References" or "In-Reply-To" values. Anyway, someone has created
real mess anyway because Notmuch combines originally separate threads
now and forever.

-- 
/// Teemu Likonen - .-.. https://www.iki.fi/tlikonen/
// OpenPGP: 6965F03973F0D4CA22B9410F0F2CAE0E07608462


signature.asc
Description: PGP signature
___
notmuch mailing list -- [email protected]
To unsubscribe send an email to [email protected]