Re: Fixed Message-ID trouble
* 2023-09-27 13:48:50-0300, David Bremner wrote: > By the way, if using the emacs front-end did you try the unthreaded > view (U)? That would at least mitigate damage from people replying to > the poisoned messages. I didn't. So thanks for reminding about the unthreaded view. It is a nice fallback mode when threading is broken or complicated. Plain list of timestamp-sorted messages help in this particular case because the originally different threads (which are now the same thread) appeared in different times. -- /// Teemu Likonen - .-.. https://www.iki.fi/tlikonen/ // OpenPGP: 6965F03973F0D4CA22B9410F0F2CAE0E07608462 signature.asc Description: PGP signature ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
Teemu Likonen writes: > Some person on debian-user mailing list seems to be sending messages > with fixed Message-ID field: the same ID in different messages. In > Notmuch it is creating trouble because it connects unrelated threads to > one. The person has different messages in different threads but Notmuch > thinks they are the same message because the Message-ID is the same. > > This is potentially a "denial of service" for Notmuch. Well, not quite, > but is harmful nonetheless. How would a Notmuch user fix the mess or > protect himself against it? By the way, if using the emacs front-end did you try the unthreaded view (U)? That would at least mitigate damage from people replying to the poisoned messages. I could imagine a future version of notmuch considering the identification of files with the same message id as part of "threading", and allowing an unthreaded view to just show all the files, effectively ignoring the message-id. The next step would be to do that selectively for some messages. This all requires a complete redesign of the database schema, so I don't know how realistic it is. d ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
Teemu Likonen writes: > Will Notmuch also break the thread so that this edited message will > start a new thread? Maybe the message itself but its follow-ups need to > be fixed too. Often "References" points several earlier messages in the > chain. So, to detach a subthread from bigger thread would need manual > editing for more than one message: Yeah, once people start replying to the broken messages, it becomes more complicated, as you point out. ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
Andreas Kähäri writes: > [...] >> > stupid "external message" headers added by malicious^Wcorporate mail >> > servers, etc... >> >> Headers would not "muddy the waters" since they are headers. In my mind, >> the hash would be over the body only. > > Hi, I'm not really part of the discussion, but I can add a quick thought > and a suggestion. > > There are corporate mail servers that add a boilerplate "header" to the > body of outgoing email messages. The more common practice is to add a > "footer" to the message. I have seen these footers being added both > before and after the user's signature. You can not use a hash that > contains the body of the message to identify the message as unique. Thanks for pointing out. You're right, of course; I have seen such things myself, too. It thus seems to me that the body hash idea is officially not working. I rest my case. > Using the earliest Received header (the one furtherst down) as a unique > identifier would possibly be a better approach. Since this likely > contains the identity of the originating mail server, some mail queue > ID, and a timestamp, it should be unique enough to identify the message, > even if the message is received via multiple routes and has a non-unique > Message ID. > [...] I would strongly advise against using any "early" Received (or any other) header for any heuristics. In spam traffic most headers will all but certainly be fake. The only ones to trust is the very last Received header added by your own (or your provider's) mail system. Trying to control your code's behaviour based on maliciously crafted data would hence mean intentionally exposing an attack surface. Parsing these data for display to the user (as is the case now) is as far as I would suggest going with that; but no further. Cheers, --alexander ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
On Mon 2023-09-25 11:54:07 +0300, Teemu Likonen wrote: > Some person on debian-user mailing list seems to be sending messages > with fixed Message-ID field: the same ID in different messages. In > Notmuch it is creating trouble because it connects unrelated threads to > one. The person has different messages in different threads but Notmuch > thinks they are the same message because the Message-ID is the same. > > This is potentially a "denial of service" for Notmuch. Well, not quite, > but is harmful nonetheless. How would a Notmuch user fix the mess or > protect himself against it? fwiw, the duplicate message-id attack vector a long-recognized problem: https://nmbug.notmuchmail.org/nmweb/show/87k42vrqve.fsf%40pip.fifthhorseman.net yikes, over a decade ago ☹ With recent versions of notmuch, if the problem is a message-id collision, you can at least *see* the different variant forms of a given message by cycling through the list of duplicates (e.g. via notmuch-show-choose-duplicate in notmuch-emacs), thanks to excellent work by David Bremner: https://nmbug.notmuchmail.org/nmweb/show/20220701214548.461943-1-david%40tethera.net As for thread splitting/re-joining based on References: and In-Reply-To: headers, you might be interested in these oldies-but-goodies from the mailing list archives, which as far as i know we have never managed to resolve: https://nmbug.notmuchmail.org/nmweb/show/AANLkTimDjk_-Xjpf6uovGXgyG_3j-ySLWQR%2B0UvdVjjT%40mail.gmail.com https://nmbug.notmuchmail.org/nmweb/show/87mvp9uwi4.fsf%40alice.fifthhorseman.net Sorry to only have archival references here and not robust/complete fixes. --dkg signature.asc Description: PGP signature ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
* 2023-09-26 07:07:46-0300, David Bremner wrote: > Teemu Likonen writes: >> Perhaps my wish is that there was an easy way to break threads: mark a >> message as origin of a new thread. > How about if you delete the Message-ID, References, and In-Reply-To > headers from the bad messages and re-index? Notmuch will synthesize a > unique Message-Id if there is none present. Will Notmuch also break the thread so that this edited message will start a new thread? Maybe the message itself but its follow-ups need to be fixed too. Often "References" points several earlier messages in the chain. So, to detach a subthread from bigger thread would need manual editing for more than one message: 1. Edit one message and remove its "References" and "In-Reply-To". Possibly edit "Message-ID". This would be the origin of a new thread. 2. Check all follow-ups to that message and make them refer the new origin and its (possibly) new "Message-ID". Remove references that go beyond the origin. 3. Reindex. Or just forget the mess and move on with life. :-) -- /// Teemu Likonen - .-.. https://www.iki.fi/tlikonen/ // OpenPGP: 6965F03973F0D4CA22B9410F0F2CAE0E07608462 signature.asc Description: PGP signature ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
On Tue, Sep 26, 2023 at 01:44:00PM +0200, Alexander Adolf wrote: > David Bremner writes: > > > Alexander Adolf writes: > > > >> Bearing in mind that re-recognising a message which has arrived > >> multiple times via different routes is a worthwhile feature, it would > >> seem to me that a hash over the invariant part of the message, that is > >> the body, would allow for such detection. In that light, it would seem > >> to me that the tuple (body_hash, message_id) could be a candidate for > >> a “unique enough”(tm) identifier? > > > > I always had the impression that the message body had too variation > > imposed by different delivery routes for this to be very helpful: > > essentially the hash would be different for every file due to trailers > > added by mailing lists, > > Ah, good point. I hadn't thought of mailing list trailers. Could these > perhaps be detected via the signature line separator "-- \n"? > > I guess this also touches on the question of what a consensus definition > of "sameness" could be. If we take the message-id only, it'd be a purely > technical one. If we'd include the content one way or another (for > instance via hash over the body), that would rather be an editorial > definition of "sameness". > > > re-encoding, > > Like...? utf-8 to/from quoted-printable...? > > > stupid "external message" headers added by malicious^Wcorporate mail > > servers, etc... > > Headers would not "muddy the waters" since they are headers. In my mind, > the hash would be over the body only. Hi, I'm not really part of the discussion, but I can add a quick thought and a suggestion. There are corporate mail servers that add a boilerplate "header" to the body of outgoing email messages. The more common practice is to add a "footer" to the message. I have seen these footers being added both before and after the user's signature. You can not use a hash that contains the body of the message to identify the message as unique. Using the earliest Received header (the one furtherst down) as a unique identifier would possibly be a better approach. Since this likely contains the identity of the originating mail server, some mail queue ID, and a timestamp, it should be unique enough to identify the message, even if the message is received via multiple routes and has a non-unique Message ID. > > I could be wrong, maybe hashing is a useful approach, but I'd need to > > see some numbers to be convinced. > > I fully agree that we need to adapt to the realities of how things are > actually used, not how they were intended to be used. > > How would I find instances of multiple files for the same message-id in > my database for example? > > > Cheers, > > --alexander > ___ > notmuch mailing list -- [email protected] > To unsubscribe send an email to [email protected] -- Andreas (Kusalananda) Kähäri Uppsala, Sweden . ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
David Bremner writes: > Alexander Adolf writes: > >> Bearing in mind that re-recognising a message which has arrived >> multiple times via different routes is a worthwhile feature, it would >> seem to me that a hash over the invariant part of the message, that is >> the body, would allow for such detection. In that light, it would seem >> to me that the tuple (body_hash, message_id) could be a candidate for >> a “unique enough”(tm) identifier? > > I always had the impression that the message body had too variation > imposed by different delivery routes for this to be very helpful: > essentially the hash would be different for every file due to trailers > added by mailing lists, Ah, good point. I hadn't thought of mailing list trailers. Could these perhaps be detected via the signature line separator "-- \n"? I guess this also touches on the question of what a consensus definition of "sameness" could be. If we take the message-id only, it'd be a purely technical one. If we'd include the content one way or another (for instance via hash over the body), that would rather be an editorial definition of "sameness". > re-encoding, Like...? utf-8 to/from quoted-printable...? > stupid "external message" headers added by malicious^Wcorporate mail > servers, etc... Headers would not "muddy the waters" since they are headers. In my mind, the hash would be over the body only. > I could be wrong, maybe hashing is a useful approach, but I'd need to > see some numbers to be convinced. I fully agree that we need to adapt to the realities of how things are actually used, not how they were intended to be used. How would I find instances of multiple files for the same message-id in my database for example? Cheers, --alexander ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
Alexander Adolf writes: > > Bearing in mind that re-recognising a message which has arrived > multiple times via different routes is a worthwhile feature, it would > seem to me that a hash over the invariant part of the message, that is > the body, would allow for such detection. In that light, it would seem > to me that the tuple (body_hash, message_id) could be a candidate for > a “unique enough”(tm) identifier? I always had the impression that the message body had too variation imposed by different delivery routes for this to be very helpful: essentially the hash would be different for every file due to trailers added by mailing lists, re-encoding, stupid "external message" headers added by malicious^Wcorporate mail servers, etc... I could be wrong, maybe hashing is a useful approach, but I'd need to see some numbers to be convinced. ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
Teemu Likonen writes: > * 2023-09-25 07:33:23-0400, Daniel Corbe wrote: > >> Silly question, I know, but have you actually tried reaching out to >> the user? > > Not silly, but I don't even know who the person is. All I see is the > mess, and everything else is my interpretation of the cause. Notmuch > Emacs tree mode shows messages' relations but they are not accurate if > references are messed up. It's difficult to dig into Message-ID level of > relations. > > Perhaps my wish is that there was an easy way to break threads: mark a > message as origin of a new thread. Or perhaps I just use my custom > ignore mechanism to mark messed threads automatically as read and move > on. How about if you delete the Message-ID, References, and In-Reply-To headers from the bad messages and re-index? Notmuch will synthesize a unique Message-Id if there is none present. ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
Hi, On Mon, Sep 25, 2023 at 11:53:34PM +0200, Gregor Zattler wrote: > Hi Teemu, notmuch users, > * Teemu Likonen [2023-09-25; 11:54 +03]: > > Some person on debian-user mailing list seems to be sending messages > > with fixed Message-ID field: the same ID in different messages. […] > would you please give details of some such posts? Then > other people are able to investigate. Here's an explainer for confused people on the debian-user list: https://lists.debian.org/debian-user/2023/09/msg00515.html Here's an mbox of the five messages that dsr sent that have a different message ID format to their other messages, and show two duplicate IDs: https://strugglers.net/~andy/dsr.mbox $ grep '^Message-ID' ~/public_html/dsr.mbox Message-ID: <[email protected]> Message-ID: <[email protected]> Message-ID: <[email protected]> Message-ID: <[email protected]> Message-ID: <[email protected]> dsr is now aware of the problem and says they have fixed it. Cheers, Andy ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
Hi Teemu, notmuch users, * Teemu Likonen [2023-09-25; 11:54 +03]: > Some person on debian-user mailing list seems to be sending messages > with fixed Message-ID field: the same ID in different messages. In > Notmuch it is creating trouble because it connects unrelated threads to > one. The person has different messages in different threads but Notmuch > thinks they are the same message because the Message-ID is the same. > > This is potentially a "denial of service" for Notmuch. Well, not quite, > but is harmful nonetheless. How would a Notmuch user fix the mess or > protect himself against it? would you please give details of some such posts? Then other people are able to investigate. Ciao; Gregor ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
Hello, This sounds like a nasty problem indeed. OTOH, “there’s nothing that couldn’t be” as my granny would have put it. Bearing in mind that re-recognising a message which has arrived multiple times via different routes is a worthwhile feature, it would seem to me that a hash over the invariant part of the message, that is the body, would allow for such detection. In that light, it would seem to me that the tuple (body_hash, message_id) could be a candidate for a “unique enough”(tm) identifier? --alex -- www.condition-alpha.com / @c_alpha Sent from my iPhone; apologies for brevity and autocorrect weirdness. > On 25. Sep 2023, at 14:00, Teemu Likonen wrote: > > * 2023-09-25 07:33:23-0400, Daniel Corbe wrote: > >> Silly question, I know, but have you actually tried reaching out to >> the user? > > Not silly, but I don't even know who the person is. All I see is the > mess, and everything else is my interpretation of the cause. Notmuch > Emacs tree mode shows messages' relations but they are not accurate if > references are messed up. It's difficult to dig into Message-ID level of > relations. > > Perhaps my wish is that there was an easy way to break threads: mark a > message as origin of a new thread. Or perhaps I just use my custom > ignore mechanism to mark messed threads automatically as read and move > on. > > -- > /// Teemu Likonen - .-.. https://www.iki.fi/tlikonen/ > // OpenPGP: 6965F03973F0D4CA22B9410F0F2CAE0E07608462 > ___ > notmuch mailing list -- [email protected] > To unsubscribe send an email to [email protected] signature.asc Description: Binary data smime.p7s Description: S/MIME cryptographic signature ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
* 2023-09-25 07:33:23-0400, Daniel Corbe wrote: > Silly question, I know, but have you actually tried reaching out to > the user? Not silly, but I don't even know who the person is. All I see is the mess, and everything else is my interpretation of the cause. Notmuch Emacs tree mode shows messages' relations but they are not accurate if references are messed up. It's difficult to dig into Message-ID level of relations. Perhaps my wish is that there was an easy way to break threads: mark a message as origin of a new thread. Or perhaps I just use my custom ignore mechanism to mark messed threads automatically as read and move on. -- /// Teemu Likonen - .-.. https://www.iki.fi/tlikonen/ // OpenPGP: 6965F03973F0D4CA22B9410F0F2CAE0E07608462 signature.asc Description: PGP signature ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
> On Sep 25, 2023, at 06:52, Teemu Likonen wrote: > >> Some person on debian-user mailing list seems to be sending messages >> with fixed Message-ID field: the same ID in different messages. In >> Notmuch it is creating trouble because it connects unrelated threads to >> one. The person has different messages in different threads but Notmuch >> thinks they are the same message because the Message-ID is the same. >> >> This is potentially a "denial of service" for Notmuch. Well, not quite, >> but is harmful nonetheless. How would a Notmuch user fix the mess or >> protect himself against it? > > I am no longer sure if this issue is caused by fixed "Message-ID" or > wrong "References" or "In-Reply-To" values. Anyway, someone has created > real mess anyway because Notmuch combines originally separate threads > now and forever. Silly question, I know, but have you actually tried reaching out to the user? No MUA that I’m aware of acts like this and it’s pretty clear from documentation and standards tracks that Message-ID is meant to be globally unique per message. If the user is knowledgeable enough to have a boutique mail reader, they’re probably also knowledgeable enough to correct the defect too. signature.asc Description: Message signed with OpenPGP ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
Am Mo., 25. Sept. 2023 um 12:53 Uhr schrieb Teemu Likonen : > > * 2023-09-25 11:54:07+0300, Teemu Likonen wrote: > > > Some person on debian-user mailing list seems to be sending messages > > with fixed Message-ID field: the same ID in different messages. In > > Notmuch it is creating trouble because it connects unrelated threads to > > one. The person has different messages in different threads but Notmuch > > thinks they are the same message because the Message-ID is the same. > > > > This is potentially a "denial of service" for Notmuch. Well, not quite, > > but is harmful nonetheless. How would a Notmuch user fix the mess or > > protect himself against it? > > I am no longer sure if this issue is caused by fixed "Message-ID" or > wrong "References" or "In-Reply-To" values. Anyway, someone has created > real mess anyway because Notmuch combines originally separate threads > now and forever. Yes, several sources of different badness ... Still, if I understand correctly, a new message with a pre-existing mid ends up being registered by notmuch as a second file for the "same" message irrespective of differences in the actual files. For message copies which you receive via different paths (say directly plus via an ml) this may or may not be what you want. Used intentionally, it may create harm - how do other mailers handle this? Show them in parallel in the same thread (but as individual messages)? Michael ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
Re: Fixed Message-ID trouble
* 2023-09-25 11:54:07+0300, Teemu Likonen wrote: > Some person on debian-user mailing list seems to be sending messages > with fixed Message-ID field: the same ID in different messages. In > Notmuch it is creating trouble because it connects unrelated threads to > one. The person has different messages in different threads but Notmuch > thinks they are the same message because the Message-ID is the same. > > This is potentially a "denial of service" for Notmuch. Well, not quite, > but is harmful nonetheless. How would a Notmuch user fix the mess or > protect himself against it? I am no longer sure if this issue is caused by fixed "Message-ID" or wrong "References" or "In-Reply-To" values. Anyway, someone has created real mess anyway because Notmuch combines originally separate threads now and forever. -- /// Teemu Likonen - .-.. https://www.iki.fi/tlikonen/ // OpenPGP: 6965F03973F0D4CA22B9410F0F2CAE0E07608462 signature.asc Description: PGP signature ___ notmuch mailing list -- [email protected] To unsubscribe send an email to [email protected]
