[Ietf-dkim] Re: Should we be recording all modifications
I have some thoughts on this issue, but am holding off on commenting until the charter is settled. -Jim On 27 Nov 2024, at 13:30, Bron Gondwana wrote: I posted an updated draft for this last week with the 'z=y' case for "complex irreversible change". I am interested (yes, I know - technical questions before chartered) in how people feel about a line-based copy format rather than just the character based one. I'm thinking that because the most common "corruption" of emails is different line endings; and that will mess with character counts - and the canonicalisation for calculating body hashes is designed to give the same result if line endings change. Bron. On Mon, Nov 18, 2024, at 09:19, Bron Gondwana wrote: I don't believe it's that complex, and I do believe it's worth the effort in exchange for being able to tell with certainty which entity (by signature; which DNS domain) is responsible for creating each part of a message. You can then attribute parts of the text to different entities - the original author, or the mailing list signature. And if a message is bad then it's possible to derive where the badness was introduced - something not possible with DKIM or ARC if a message has been modified. I have a draft for a method at: https://datatracker.ietf.org/doc/draft-gondwana-dkim2-modification-alegbra/ It can be used to describe all "add text" cases quite nicely, as well as wrapped structures where an existing message gets moved into a multipart/mixed with more content at the end. There's still some testing to be done for the most complex cases - but this doesn't have to be a two-way algorithm, is just has to allow describing how to convert a new email body back to the original email body, and I believe this can be done reliably and at a reasonable cost, though it could definitely use some more examples. I'm going to publish an update with another mechanism which reduces the cost of the "remove an attachment" version to at least not fill the headers with tons of junk. It doesn't reduce the message size though, because you do need to be able to recreate the old message. And I do agree there needs to be a way to say "I made changes, and I'm not telling you how to undo them" as well. Cheers, Bron. -- Bron Gondwana, CEO, Fastmail Pty Ltd br...@fastmailteam.com -- Bron Gondwana, CEO, Fastmail Pty Ltd br...@fastmailteam.com ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
I posted an updated draft for this last week with the 'z=y' case for "complex irreversible change". I am interested (yes, I know - technical questions before chartered) in how people feel about a line-based copy format rather than just the character based one. I'm thinking that because the most common "corruption" of emails is different line endings; and that will mess with character counts - and the canonicalisation for calculating body hashes is designed to give the same result if line endings change. Bron. On Mon, Nov 18, 2024, at 09:19, Bron Gondwana wrote: > I don't believe it's that complex, and I do believe it's worth the effort in > exchange for being able to tell with certainty which entity (by signature; > which DNS domain) is responsible for creating each part of a message. You can > then attribute parts of the text to different entities - the original author, > or the mailing list signature. > > And if a message is bad then it's possible to derive where the badness was > introduced - something not possible with DKIM or ARC if a message has been > modified. I have a draft for a method at: > > https://datatracker.ietf.org/doc/draft-gondwana-dkim2-modification-alegbra/ > > It can be used to describe all "add text" cases quite nicely, as well as > wrapped structures where an existing message gets moved into a > multipart/mixed with more content at the end. There's still some testing to > be done for the most complex cases - but this doesn't have to be a two-way > algorithm, is just has to allow describing how to convert a new email body > back to the original email body, and I believe this can be done reliably and > at a reasonable cost, though it could definitely use some more examples. > > I'm going to publish an update with another mechanism which reduces the cost > of the "remove an attachment" version to at least not fill the headers with > tons of junk. It doesn't reduce the message size though, because you do need > to be able to recreate the old message. > > And I do agree there needs to be a way to say "I made changes, and I'm not > telling you how to undo them" as well. > > Cheers, > > Bron. > > -- > Bron Gondwana, CEO, Fastmail Pty Ltd > br...@fastmailteam.com > > -- Bron Gondwana, CEO, Fastmail Pty Ltd br...@fastmailteam.com ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
Bron Gondwana wrote in <8361f17f-aaf0-4f8e-a1c0-2ec99911b...@app.fastmail.com>: |On Tue, Nov 19, 2024, at 12:14, Steffen Nurpmeso wrote: |> I wondered for myself how the bsdiff algorithm would work out for |> such things. | |This is basically the bsdiff algorithm, but with the syntax converted \ |to be something human readable and header safe. | |And obviously, only applied to the message body - headers get all \ |sorts of trace stuff and re-ordering applied. | |Bron. | |Here's a body diff in bsdiff format for the example JMAP mailing list \ |post on in the repo - just of the body parts: | |brong@elg:~/src/dkim2/examples$ hexdump -C o | 42 53 44 49 46 46 34 30 36 00 00 00 00 00 00 00 |BSDIFF406..\ |.| |0010 2c 00 00 00 00 00 00 00 d3 06 00 00 00 00 00 00 |,..\ |.| |0020 42 5a 68 39 31 41 59 26 53 59 02 b7 d3 b0 00 00 |BZh91AY&SY.\ |.| |0030 01 c0 c2 69 14 00 10 40 00 08 00 20 00 31 06 4c |...i...@... \ |.1.L| |0040 40 d3 4d 1a 68 99 e4 2a a0 22 39 3c 5d c9 14 e1 |@.M.h..*."9\ |<]...| |0050 42 40 0a df 4e c0 42 5a 68 39 31 41 59 26 53 59 |B@..N.BZh91\ |AY&SY| |0060 18 15 27 b0 00 00 03 40 02 c0 00 02 00 00 08 20 |..'@...\ | | |0070 00 30 cc 08 9a 43 40 bc 5d c9 14 e1 42 40 60 54 |.0...C@.]..\ |.B@`T| |0080 9e c0 42 5a 68 39 17 72 45 38 50 90 00 00 00 00 |..BZh9.rE8P\ |.| |0090 | |And the same as an example header: | |DKIM2-Diff-Body: i=1; | c=0-1747 | |(and the diff with a regular text diff) | |brong@elg:~/src/dkim2/examples$ diff b a |1,3d0 |< --===5385250436117681394== |< Content-Type: multipart/alternative; boundary=12b53dc829d24511bfa04f7d\ |5e3675f8 |< |45,58d41 |< |< |< --===5385250436117681394== |< Content-Type: text/plain; charset="us-ascii" |< MIME-Version: 1.0 |< Content-Transfer-Encoding: 7bit |< Content-Disposition: inline |< |< ___ |< Jmap mailing list |< j...@ietf.org |< https://www.ietf.org/mailman/listinfo/jmap |< |< --===5385250436117681394==-- For clarity despite the ML silencing i really wanted to add that you seem to have misused the bsdiff program. Despite that its usage say "oldfile newfile patchfile" it really is "after before patch", and ditto bspatch "oldfile newfile patchfile" really is "after restored patch". One will recognize when trying to actually restore data. (Ie, the FreeBSD logo is a little red daemon, .. the software is free BSD 2-clause, and really good, as can be seen in thesis :). DKIM now horny will use intransparent compressed base64-ified data, but this saves quite some processing cost, and (will) include(s) range checks etc out of the box, no errors on that front are to be expected. I hope i can start writing this on Saturday, and publish in November. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) | |And in Fall, feel "The Dropbear Bard"s ball(s). | |The banded bear |without a care, |Banged on himself fore'er and e'er | |Farewell, dear collar bear ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
On 20/11/2024 15:31, Richard Clayton wrote: Since we're meant to be discussing whether to open a WG and what it's charter should be, should superseding ARC be specifically mentioned ? Once again I have to disagree that DKIM2 should mix "just trust me" operations with "you can see what I did" operations. If there's a wish for a "just trust me" system, people should be using ARC instead. DKIM2 should not supersede ARC in this aspect. It's also highly likely that this would become a path of least resistance, downgrading the DKIM2 experience for many. smime.p7s Description: S/MIME Cryptographic Signature ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 In message <5d91cde1-6f5b-5403-1ed6-d90f3cf92...@crash.com>, Steven M Jones writes > >On 11/20/24 12:03, Richard Clayton wrote: >> it means that if the message, for whatever reason, reaches another DKIM2 >> system it is possible to determine that the gateway intentionally >> changed the message ... (and hence local policy is going to have to kick >> in to decide what to do with a failing signature) otherwise one might >> conclude that the failure of every preceding signature was some other >> systems failure to look after the message properly -- and it might be >> that a DSN was speciously generated (depending on the exact chain of >> custody) > >So the proposition is that we would universally apply DKIM2 at the SEG >and verify again at the recipient ADMD/mailstore, so that if X% of >messages are forwarded or otherwise escape, they could be checked with >DKIM2 at the downstream hops, and not have to be treated as ever having >left the DKIM2 world, which... would mean just handling it as they do >today, right? Once you've left DKIM2, you fallback to the old ways of >doing things. It is not necessary for the mailstore to check anything if it trusts the security gateway ... however, out-of-the-box MTAs may well do checks so it makes sense for the security gateway to add a DKIM2 header saying what it has done >I thought we were looking at a not-uncommon enterprise situation where >we have an adequate trust mechanism in place today without much >forwarding, and we're going to impose a lot of overhead for what looked >like not much benefit. adding one DKIM2 header (which says "it's complicated" to cover modifications) does not sound like "a lot of overhead" to me. It is certainly simpler than documenting what those changes were. ... and please note that messages that leave the DKIM2 world may re- enter it thereafter ... you'll note that we have reconsidered our proposal from - -00 of our draft to -01. This is a complicated space where the trade- offs are not immediately obvious >But are we more thinking of large mailbox >providers like mobile telcos using SEGs/services, with massive >forwarding populations, and we're focused on their downstream impacts? there are a fair number of people using large mailbox providers who receive email via Proofpoint (and doubtless their competitors as well) I have no doubt that these systems are adding ARC headers today (hoping that they will be trusted sufficiently that "no auth no entry" will not be a problem). Since we're meant to be discussing whether to open a WG and what it's charter should be, should superseding ARC be specifically mentioned ? - -- richard Richard Clayton Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755 -BEGIN PGP SIGNATURE- Version: PGPsdk version 1.7.1 iQA+AwUBZz3kzt2nQQHFxEViEQIjxACXZwfS+2JEN1TD3m9lt/IGi8VIGwCfR+8c KHAR8NGfoOWupPbJzm3XMXw= =8F1S -END PGP SIGNATURE- ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
On 11/20/24 12:03, Richard Clayton wrote: In message <58825b2d-2eb9-6306-c721-fb4b95c15...@crash.com>, Steven M Jones writes > If I follow this, the use case is a Secure Email Gateway or SEG, to use a > Gartner-ism, and is likely the last hop before delivery to the recipient ADMD or > mailstore. So why is DKIM2's "it's complicated" flag more useful here than the > configured exception for the service or gateway the receiving ADMD contracted > with? it means that if the message, for whatever reason, reaches another DKIM2 system it is possible to determine that the gateway intentionally changed the message ... (and hence local policy is going to have to kick in to decide what to do with a failing signature) otherwise one might conclude that the failure of every preceding signature was some other systems failure to look after the message properly -- and it might be that a DSN was speciously generated (depending on the exact chain of custody) So the proposition is that we would universally apply DKIM2 at the SEG and verify again at the recipient ADMD/mailstore, so that if X% of messages are forwarded or otherwise escape, they could be checked with DKIM2 at the downstream hops, and not have to be treated as ever having left the DKIM2 world, which... would mean just handling it as they do today, right? Once you've left DKIM2, you fallback to the old ways of doing things. I thought we were looking at a not-uncommon enterprise situation where we have an adequate trust mechanism in place today without much forwarding, and we're going to impose a lot of overhead for what looked like not much benefit. But are we more thinking of large mailbox providers like mobile telcos using SEGs/services, with massive forwarding populations, and we're focused on their downstream impacts? --S. ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 In message <20241119211503.2_KbB0O0@steffen%sdaoden.eu>, Steffen Nurpmeso writes >i responded to Richard >Clayton's message > > jszrijbynuonf...@highwayman.com > https://mailarchive.ietf.org/arch/msg/ietf-dkim/1ZCF-h9rHsL2YT3lTgo_qmIpVGE > >which seems 7-bit plain (one had to ask him how he sent it) there's an X-Mailer to tell you that .. and a Wikipedia entry - -- richard Richard Clayton Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755 -BEGIN PGP SIGNATURE- Version: PGPsdk version 1.7.1 iQA/AwUBZz04md2nQQHFxEViEQL7qgCfYwH/mZ+IxNkdC7VUQaXgIej56asAoMGT wZ7EvvUAuUpfmNxTwzS7vEFY =sy3E -END PGP SIGNATURE- ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 In message <58825b2d-2eb9-6306-c721-fb4b95c15...@crash.com>, Steven M Jones writes >If I follow this, the use case is a Secure Email Gateway or SEG, to use a >Gartner-ism, and is likely the last hop before delivery to the recipient ADMD >or >mailstore. So why is DKIM2's "it's complicated" flag more useful here than the >configured exception for the service or gateway the receiving ADMD contracted >with? it means that if the message, for whatever reason, reaches another DKIM2 system it is possible to determine that the gateway intentionally changed the message ... (and hence local policy is going to have to kick in to decide what to do with a failing signature) otherwise one might conclude that the failure of every preceding signature was some other systems failure to look after the message properly -- and it might be that a DSN was speciously generated (depending on the exact chain of custody) - -- richard Richard Clayton Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755 -BEGIN PGP SIGNATURE- Version: PGPsdk version 1.7.1 iQA/AwUBZz1Rgt2nQQHFxEViEQLxbQCgqj//0vmL3p8BevIEvJXgVwABM1sAoMAB 4H1Lz7T3uxZxpa+qVc7CG59A =WV24 -END PGP SIGNATURE- ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 In message <20241120023021.wVeiagjR@steffen%sdaoden.eu>, Steffen Nurpmeso writes > |>i responded to Richard > |>Clayton's message > |> > |> jszrijbynuonf...@highwayman.com > |> https://mailarchive.ietf.org/arch/msg/ietf-dkim/1ZCF-h9rHsL2YT3lTgo_qmIp\ > |> VGE > |> > |>which seems 7-bit plain (one had to ask him how he sent it) > | > |there's an X-Mailer to tell you that .. and a Wikipedia entry > >I referred to content-type and content-transfer-encoding, because >they are changed due to reencoding at times (not seldom). my message was as plain as they come ... Message-ID: Date: Sun, 17 Nov 2024 01:42:10 + To: ietf-dkim@ietf.org From: Richard Clayton Subject: Re: [Ietf-dkim] Re: PROPOSAL: reopen this working group and work on DKIM2 References: <8fbb182e-cfed-422e-a2d3-a1a2ebf63...@app.fastmail.com> <1db24ccd-f67a-40d2-a4cb-cdcb6ec0c...@dcrocker.net> <8c8ea8e7-2fdc-4cf9-a4b2-7eb551523...@dcrocker.net> <34f44d43-09f1-4dbb-9b9e-11391a022...@tana.it> <20241116204935.dCb0mQcG@steffen%sdaoden.eu> In-Reply-To: <20241116204935.dCb0mQcG@steffen%sdaoden.eu> MIME-Version: 1.0 X-Mailer: Turnpike Integrated Version 5.03 M - -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 etc - -- richard @ highwayman . com "Nothing seems the same Still you never see the change from day to day And no-one notices the customs slip away" -BEGIN PGP SIGNATURE- Version: PGPsdk version 1.7.1 iQA/AwUBZz1P192nQQHFxEViEQL6GQCfSDkObxNYfPK/26ShCAy9NSl3pIsAnRmF 1cydh4lUyxGkXxdFY8UCChWw =Zjkw -END PGP SIGNATURE- ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
On 11/19/24 00:25, Wei Chuang wrote: On Sun, Nov 17, 2024 at 2:20 PM Bron Gondwana wrote: And I do agree there needs to be a way to say "I made changes, and I'm not telling you how to undo them" as well. +1. My belief is that security gateways are a particularly complex case that needs this trust me bit. For example they may redact or encrypt content where it doesn't make sense to provide the original content for. When security gateways rewrite many URLs, it may become burdensome to encode and reverse, and downstream receivers are going to have to trust the gateway to make benign transformations. This can work because they already have a relationship with their security vendor. However a trust me bit in general introduces a security loophole hence receivers should use only in those limited well understood scenarios. If I follow this, the use case is a Secure Email Gateway or SEG, to use a Gartner-ism, and is likely the last hop before delivery to the recipient ADMD or mailstore. So why is DKIM2's "it's complicated" flag more useful here than the configured exception for the service or gateway the receiving ADMD contracted with? I believe there are already many sites that are configured to accept whatever their SEG/service is doing based on IP range, TLS certificate, etc. What makes this juice worth the squeeze? --S. ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
Bron Gondwana wrote in <62b953a7-5168-45ab-8bec-f77c709b9...@app.fastmail.com>: |On Wed, Nov 20, 2024, at 08:15, Steffen Nurpmeso wrote: |> that goes out without MIME as such (text/plain 7-bit content-type |> is optional), but both of these two messages came in via ML as |> |> Content-Type: text/plain; charset="utf-8" |> Content-Transfer-Encoding: base64 | |Yeah, if the source message isn't MIME encoded, Mailman re-encodes. \ It is more than that. | It's a "detect message type" flag in the code, and it would be trivial \ |to add a config "don't do that if DKIM2" and instead just MIME-wrap \ |the existing message with the existing charset. ... |> -rw-r- 1 steffen wheel 2167 Nov 19 21:22 t1-i.txt |> -rw-r- 1 steffen wheel 2201 Nov 19 21:22 t1-o.txt |> -rw--- 1 steffen wheel 236 Nov 19 21:22 t1-patch |> -rw-r- 1 steffen wheel 8412 Nov 19 21:22 t2-i.txt |> -rw-r- 1 steffen wheel 5932 Nov 19 21:22 t2-o.txt |> -rw--- 1 steffen wheel 4350 Nov 19 21:23 t2-patch |> |> Hm. Ok let me remove the bzip2 stuff from bsdiff.. Here is the |> same without, and then running plzip and zstd on the uncompressed |> binary data; this still has the normal header and such (note |> i have not yet looked at all, it may very well be that patches at |> position 0 or "EOT" could be optimized away etc etc. |> |> plzip -9 and zstd -19 |> |> -rw--- 1 steffen wheel 142 Nov 19 21:48 t1-patch-2.lz |> -rw--- 1 steffen wheel 116 Nov 19 21:48 t1-patch-2.zst |> |> -rw--- 1 steffen wheel 4654 Nov 19 21:48 t2-patch-2.lz |> -rw--- 1 steffen wheel 4577 Nov 19 21:48 t2-patch-2.zst |> |> It would be interesting to know how your implementation of the |> algorithm works out for those (and the "real" vcsdiff |> implementation i have seen is huge). Would be cool if it is |> superior, of course. | |My code uses a pretty basic perl diffing tool, but we could use vcsdiff \ |just fine too - and have it be an input to that format. The format \ |really is basically just the logic from RFC3284; but encoded to be \ |readable. Ok i now downloaded xdelta3 which uses the VCDIFF algorithm (like Google's really big thing open-vcdiff), and i see i get for t1 Offset Code Type1 Size1 @Addr1 + Type2 Size2 @Addr2 00 019 CPY_0 54 S@0 54 002 ADD1 55 034 CPY_0 18 S@59 73 003 ADD2 75 019 CPY_0 27 S@83 000102 019 CPY_0196 S@112 000298 107 CPY_5 11 S@310 000309 051 CPY_2 53 S@323 000362 007 ADD6 000368 051 CPY_2 45 S@386 000413 051 CPY_2111 S@433 000524 099 CPY_5250 S@546 000774 035 CPY_1 21 T@309 000795 014 ADD 13 000808 069 CPY_3 5 T@362 000813 003 ADD2 000815 051 CPY_2 38 S@843 000853 099 CPY_5238 S@883 001091 003 ADD2 001093 051 CPY_2 1074 S@1127 so i wildly guess you actually postprocess this output (for now). The two examples i had posted are smaller when processed with bsdiff compared to non-postprocessed VCDIFF, that much is plain. But thank you! Ciao, --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) | |And in Fall, feel "The Dropbear Bard"s ball(s). | |The banded bear |without a care, |Banged on himself fore'er and e'er | |Farewell, dear collar bear ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
On Wed, Nov 20, 2024, at 08:15, Steffen Nurpmeso wrote: > that goes out without MIME as such (text/plain 7-bit content-type > is optional), but both of these two messages came in via ML as > > Content-Type: text/plain; charset="utf-8" > Content-Transfer-Encoding: base64 Yeah, if the source message isn't MIME encoded, Mailman re-encodes. It's a "detect message type" flag in the code, and it would be trivial to add a config "don't do that if DKIM2" and instead just MIME-wrap the existing message with the existing charset. > And here the complete text need to be replaced. This is t2*.txt. > And *if* (just thinking) that is only a forwarder address like the > @FreeBSD.org one of Colin Percival, of CPAN, sourceforge or such > kind of, then *possibly* (but likely not) once again. > > Things are better (for me as a German who effectively writes > mostly 7-bit ASCII) for the mentioned OpenGroup server, where you > sent eg text/plain; charset="utf-8"/quoted-printable (because of > a MIME-folded long line, and a German name with Umlauts etc) and > only get the 8-bit conversion. > > One more question: how about a language which practically always > needs UTF-8 with more than one byte per character, ie, an Asian > language, and such? For anyone not going the 8-bit way (like > myself) this is thus either quoted-printable or base64 right away. > Then reencodings to 8-bit are more expensive. > > Well i do not know, it would have to be tested on real life data; > of course one could hope for the future, if it is all 8-bit and > if ML software and such stops this reencoding "madness", then.. > > And, of course, all this pretty much only affects the text parts, > large images and such are base64 data and (pretty much) constant. > > My examples from above, if i pass only the bodies (i will attach > them) to bsdiff i get > > -rw-r- 1 steffen wheel 2167 Nov 19 21:22 t1-i.txt > -rw-r- 1 steffen wheel 2201 Nov 19 21:22 t1-o.txt > -rw--- 1 steffen wheel 236 Nov 19 21:22 t1-patch > -rw-r- 1 steffen wheel 8412 Nov 19 21:22 t2-i.txt > -rw-r- 1 steffen wheel 5932 Nov 19 21:22 t2-o.txt > -rw--- 1 steffen wheel 4350 Nov 19 21:23 t2-patch > > Hm. Ok let me remove the bzip2 stuff from bsdiff.. Here is the > same without, and then running plzip and zstd on the uncompressed > binary data; this still has the normal header and such (note > i have not yet looked at all, it may very well be that patches at > position 0 or "EOT" could be optimized away etc etc. > > plzip -9 and zstd -19 > > -rw--- 1 steffen wheel 142 Nov 19 21:48 t1-patch-2.lz > -rw--- 1 steffen wheel 116 Nov 19 21:48 t1-patch-2.zst > > -rw--- 1 steffen wheel 4654 Nov 19 21:48 t2-patch-2.lz > -rw--- 1 steffen wheel 4577 Nov 19 21:48 t2-patch-2.zst > > It would be interesting to know how your implementation of the > algorithm works out for those (and the "real" vcsdiff > implementation i have seen is huge). Would be cool if it is > superior, of course. My code uses a pretty basic perl diffing tool, but we could use vcsdiff just fine too - and have it be an input to that format. The format really is basically just the logic from RFC3284; but encoded to be readable. >From RFC3284 there are 3 commands: The instructions to encode and direct the reconstruction of a target window are called delta instructions. There are three types: ADD: This instruction has two arguments, a size x and a sequence of x bytes to be copied. COPY: This instruction has two arguments, a size x and an address p in the string U. The arguments specify the substring of U that must be copied. We shall assert that such a substring must be entirely contained in either S or T. RUN: This instruction has two arguments, a size x and a byte b, that will be repeated x times. I didn't bother implementing "RUN" because that seems like something that you don't realistically need in emails. For headers I implemented both plaintext "ADD" and base64 ADD to allow encoding everything neatly. The only other thing I'm thinking is whether a base64 decoding version of COPY would make sense for the body. This would allow putting phrases into the MIME preamble rather than into an ADD command and keep the DKIM2-Body-Diff header short. maybe "Diff" is the wrong name and I should rename it to Delta - which is the naming in the VCDIFF doc. Bron. > > You know, .. the "DKIM now horny" draft i will write anyway > (because why not, it only extends DKIM/6376) will include diffing, > it will state that normalized headers shall come first, followed > by normalized body, all this to be diffed and optionally > compressed (but decompressing MUST be supported; just today > Antonio Diaz Diaz posted "Lunzip 1.15-rc1 released", very small > decompressor only). > Then, if additional headers are to be included these have to be > prepend
[Ietf-dkim] Re: Should we be recording all modifications
Hello. Bron Gondwana wrote in <8361f17f-aaf0-4f8e-a1c0-2ec99911b...@app.fastmail.com>: |On Tue, Nov 19, 2024, at 12:14, Steffen Nurpmeso wrote: |> I wondered for myself how the bsdiff algorithm would work out for |> such things. | |This is basically the bsdiff algorithm, but with the syntax converted \ |to be something human readable and header safe. Thank. Ok, in this case as you show looks very impressive. The algorithm "does not need to patch" several, as you say. Also your message in my inbox and what came over IETF would be addressable like so. But, this is my data point here, it will not work out without full diffs in many other cases. For example i responded to Richard Clayton's message jszrijbynuonf...@highwayman.com https://mailarchive.ietf.org/arch/msg/ietf-dkim/1ZCF-h9rHsL2YT3lTgo_qmIpVGE which seems 7-bit plain (one had to ask him how he sent it) with 20241117030640.wk6r9c7R@steffen%sdaoden.eu https://mailarchive.ietf.org/arch/msg/ietf-dkim/k4wUEwxJLI_AIU-Q9TXBBhI0gPU that goes out without MIME as such (text/plain 7-bit content-type is optional), but both of these two messages came in via ML as Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 And here the complete text need to be replaced. This is t2*.txt. And *if* (just thinking) that is only a forwarder address like the @FreeBSD.org one of Colin Percival, of CPAN, sourceforge or such kind of, then *possibly* (but likely not) once again. Things are better (for me as a German who effectively writes mostly 7-bit ASCII) for the mentioned OpenGroup server, where you sent eg text/plain; charset="utf-8"/quoted-printable (because of a MIME-folded long line, and a German name with Umlauts etc) and only get the 8-bit conversion. One more question: how about a language which practically always needs UTF-8 with more than one byte per character, ie, an Asian language, and such? For anyone not going the 8-bit way (like myself) this is thus either quoted-printable or base64 right away. Then reencodings to 8-bit are more expensive. Well i do not know, it would have to be tested on real life data; of course one could hope for the future, if it is all 8-bit and if ML software and such stops this reencoding "madness", then.. And, of course, all this pretty much only affects the text parts, large images and such are base64 data and (pretty much) constant. My examples from above, if i pass only the bodies (i will attach them) to bsdiff i get -rw-r- 1 steffen wheel 2167 Nov 19 21:22 t1-i.txt -rw-r- 1 steffen wheel 2201 Nov 19 21:22 t1-o.txt -rw--- 1 steffen wheel 236 Nov 19 21:22 t1-patch -rw-r- 1 steffen wheel 8412 Nov 19 21:22 t2-i.txt -rw-r- 1 steffen wheel 5932 Nov 19 21:22 t2-o.txt -rw--- 1 steffen wheel 4350 Nov 19 21:23 t2-patch Hm. Ok let me remove the bzip2 stuff from bsdiff.. Here is the same without, and then running plzip and zstd on the uncompressed binary data; this still has the normal header and such (note i have not yet looked at all, it may very well be that patches at position 0 or "EOT" could be optimized away etc etc. plzip -9 and zstd -19 -rw--- 1 steffen wheel 142 Nov 19 21:48 t1-patch-2.lz -rw--- 1 steffen wheel 116 Nov 19 21:48 t1-patch-2.zst -rw--- 1 steffen wheel 4654 Nov 19 21:48 t2-patch-2.lz -rw--- 1 steffen wheel 4577 Nov 19 21:48 t2-patch-2.zst It would be interesting to know how your implementation of the algorithm works out for those (and the "real" vcsdiff implementation i have seen is huge). Would be cool if it is superior, of course. You know, .. the "DKIM now horny" draft i will write anyway (because why not, it only extends DKIM/6376) will include diffing, it will state that normalized headers shall come first, followed by normalized body, all this to be diffed and optionally compressed (but decompressing MUST be supported; just today Antonio Diaz Diaz posted "Lunzip 1.15-rc1 released", very small decompressor only). Then, if additional headers are to be included these have to be prepended, like trace headers for an email; maybe that special case can be optimized away very easily (from bsdiff .. for now). Regarding licenses these are BSD 2-clause, MIT, and i think lzip is available as public domain (despite the IETF draft variant). The nice thing about all that long time matured software is that it is very small, statically linking them all in is no problem; on FreeBSD: -rw--- 1 steffen wheel 19992 Nov 19 21:58 bsdiff.o -rw--- 1 steffen wheel 14904 Nov 19 21:58 divsufsort.o -rw--- 1 steffen wheel 43928 Nov 19 21:58 sssort.o -rw--- 1 steffen wheel 32848 Nov 19 21:58 trsort.o -rw--- 1 steffen wheel 19000 Nov 19 21:58 utils.o #|f-1400:/tmp/z$ ll bsdiff -rwx-- 1 steffen wheel 49200 Nov 19 21:58 bsdiff* #|f-1400:/tmp/z$ strip bsdiff #|f-1400:/tmp/z$ ll bsdiff -rwx-- 1 steffen wheel 46200 Nov 19 21:58 bsdiff* and
[Ietf-dkim] Re: Should we be recording all modifications
On Tue, Nov 19, 2024, at 12:14, Steffen Nurpmeso wrote: > I wondered for myself how the bsdiff algorithm would work out for > such things. This is basically the bsdiff algorithm, but with the syntax converted to be something human readable and header safe. And obviously, only applied to the message body - headers get all sorts of trace stuff and re-ordering applied. Bron. Here's a body diff in bsdiff format for the example JMAP mailing list post on in the repo - just of the body parts: brong@elg:~/src/dkim2/examples$ hexdump -C o 42 53 44 49 46 46 34 30 36 00 00 00 00 00 00 00 |BSDIFF406...| 0010 2c 00 00 00 00 00 00 00 d3 06 00 00 00 00 00 00 |,...| 0020 42 5a 68 39 31 41 59 26 53 59 02 b7 d3 b0 00 00 |BZh91AY&SY..| 0030 01 c0 c2 69 14 00 10 40 00 08 00 20 00 31 06 4c |...i...@... .1.L| 0040 40 d3 4d 1a 68 99 e4 2a a0 22 39 3c 5d c9 14 e1 |@.M.h..*."9<]...| 0050 42 40 0a df 4e c0 42 5a 68 39 31 41 59 26 53 59 |B@..N.BZh91AY&SY| 0060 18 15 27 b0 00 00 03 40 02 c0 00 02 00 00 08 20 |..'@... | 0070 00 30 cc 08 9a 43 40 bc 5d c9 14 e1 42 40 60 54 |.0...C@.]...B@`T| 0080 9e c0 42 5a 68 39 17 72 45 38 50 90 00 00 00 00 |..BZh9.rE8P.| 0090 And the same as an example header: DKIM2-Diff-Body: i=1; c=0-1747 (and the diff with a regular text diff) brong@elg:~/src/dkim2/examples$ diff b a 1,3d0 < --===5385250436117681394== < Content-Type: multipart/alternative; boundary=12b53dc829d24511bfa04f7d5e3675f8 < 45,58d41 < < < --===5385250436117681394== < Content-Type: text/plain; charset="us-ascii" < MIME-Version: 1.0 < Content-Transfer-Encoding: 7bit < Content-Disposition: inline < < ___ < Jmap mailing list < j...@ietf.org < https://www.ietf.org/mailman/listinfo/jmap < < --===5385250436117681394==-- Bron. -- Bron Gondwana, CEO, Fastmail Pty Ltd br...@fastmailteam.com ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
Bron Gondwana wrote in <71d3b35b-e9e1-43bd-a6ab-d0cb26152...@app.fastmail.com>: ... |[.] I have a draft for a method at: | |https://datatracker.ietf.org/doc/draft-gondwana-dkim2-modification-alegbra/ | |It can be used to describe all "add text" cases quite nicely, as well \ |as wrapped structures where an existing message gets moved into a multip\ |art/mixed with more content at the end. There's still some testing \ |to be done for the most complex cases - but this doesn't have to be \ |a two-way algorithm, is just has to allow describing how to convert \ |a new email body back to the original email body, and I believe this \ |can be done reliably and at a reasonable cost, though it could definitely \ |use some more examples. | |I'm going to publish an update with another mechanism which reduces \ |the cost of the "remove an attachment" version to at least not fill \ |the headers with tons of junk. It doesn't reduce the message size \ |though, because you do need to be able to recreate the old message. I wondered for myself how the bsdiff algorithm would work out for such things. This is a very old program present in any FreeBSD system since twenty years and more. The executable is all in all ~8.5KB, and it uses the libdivsufsort library (the source of) which is 72 KB all in all. This executable by default compresses via bzip2 (which makes a bit of the 8.5 KB). For example if i strip the content of the HTML part of your message, then removing the IETF ML attachment diff(1) -rw--- 1 steffen wheel 5295 Nov 19 01:50 m0 -rw--- 1 steffen wheel 4929 Nov 19 01:50 m1 --- m0 2024-11-19 01:50:20.390006000 +0100 +++ m1 2024-11-19 01:50:32.441447000 +0100 @@ -87,16 +87,6 @@ Content-Transfer-Encoding: quoted-printable Content-Type: text/html Content-Transfer-Encoding: quoted-printable ---===5952072662436684613== -Content-Type: text/plain; charset="utf-8" -MIME-Version: 1.0 -Content-Transfer-Encoding: base64 -Content-Disposition: inline - -X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSWV0Zi1ka2lt -IG1haWxpbmcgbGlzdCAtLSBpZXRmLWRraW1AaWV0Zi5vcmcKVG8gdW5zdWJzY3JpYmUgc2VuZCBh -biBlbWFpbCB0byBpZXRmLWRraW0tbGVhdmVAaWV0Zi5vcmcK - --===5952072662436684613==-- via bsdiff results in a 168 byte file. This can be changed of course as this file is identifieable # file yy yy: bsdiff(1) patch file and the IETF drafted algorithm lzip is very good with such text (much better than the much larger (factor ten) RFCd zstd). The author of the algorithm is a decade long FreeBSD+ developer and had written his Oxford thesis based on this topic: http://www.daemonology.net/papers/thesis.pdf 'Must be said that the memory cost of this thing is The bsdiff utility uses memory equal to 17 times the size of oldfile, and requires an absolute minimum working set size of 8 times the size of oldfile. which is quite a bit with those new-style HTML emails with lots of too-large-a-snapshot images. I want to point out that, as can be seen above, especially the Mailman(3) ML software, or let's say, especially the Python stuff, has a favour of reencoding anything in base64. Whereas others, for example the one used by the OpenGroup, has the pesky quirk of reencoding to 8-bit -- even if that means that "From "quoting "has" to be applied. This effectively means that the differences after mangling of such things like mailing-list managers will, at the current state of affairs, be larger than what i would expect from reading what was said on that diffing topic. At least today, i always hated it, and maybe if people like you, Mr. Levine and others speak to maintainers of MIME aware mailing-list managers, things will change over time. .. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) | |And in Fall, feel "The Dropbear Bard"s ball(s). | |The banded bear |without a care, |Banged on himself fore'er and e'er | |Farewell, dear collar bear ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 In message <20241118210924.4dff9a743...@ary.qy>, John Levine writes >Right. We need to make it clear that the "trust me" bit is only intended >for mail from gateways with whom you already have a relationship. I have been calling the bit "it's complicated" since it would only need to be used when it was impractical to describe the change (or as Wei suggests, it would leak confidential information). ... it does however mean that anyone receiving the email thereafter will need to have trust (or a devil-may-care local policy). It's unlikely that forwarding such email outside of the receiving organisation is going to succeed because the trust will evaporate. - -- richard Richard Clayton Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755 -BEGIN PGP SIGNATURE- Version: PGPsdk version 1.7.1 iQA/AwUBZzvAzt2nQQHFxEViEQJoBACfcgPULoRmuXpwzyv9NB5U3dpzEXkAn1vk 9T1mw8z9CcuYmj38Z6Qjdrgy =+zXS -END PGP SIGNATURE- ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
It appears that Wei Chuang said: >I'm very much in agreement with the need to attribute who contributed which >content to the message. I think this is the key difference from the >RFC6376 DKIM l= body length tag (section 3.5) that tried to tolerate >mailing footer modification also, but left unknown who added a potentially >malicious footer, which can be exploited today. Agreed there is DKIM RFC >security section 8.2 that warns about using DKIM body length but >unfortunately a very small but important part of the sending population >uses it today despite those warnings and risk. My guess is that they have >use cases that compel them to use DKIM body length despite its >unsoundness. When this came up a few months ago, I think we found that the people we found using l= didn't understand it. There was an ESP (bulk mailer) that was doing l=0 who promptly stopped when someone told them why it was a bad idea, and I think there was some list software that was putting l= with the actual length as a poorly chosen default. This is quite different from the algebra, since l= gave you no reliable way to tell who might have added what. >+1. My belief is that security gateways are a particularly complex case >that needs this trust me bit. For example they may redact or >encrypt content where it doesn't make sense to provide the original content >for. When security gateways rewrite many URLs, it may become burdensome to >encode and reverse, and downstream receivers are going to have to trust >the gateway to make benign transformations. This can work because they >already have a relationship with their security vendor. However a trust me >bit in general introduces a security loophole hence receivers should use >only in those limited well understood scenarios. Right. We need to make it clear that the "trust me" bit is only intended for mail from gateways with whom you already have a relationship. R's, John ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
It appears that Taavi Eomäe said: >-=-=-=-=-=- >-=-=-=-=-=- >On 18/11/2024 00:19, Bron Gondwana wrote: >> And I do agree there needs to be a way to say "I made changes, and I'm >> not telling you how to undo them" as well. > >This has the risk of completely nullifying the intent behind the new >standard by providing a path of least resistance too many would take. You'd only accept that kind of mail from someone for whom you have an external reason to assume they're benign. The obvious example is Proofpoint's filtering proxy which rewrites all the URLs in a message, but only send it to their customers who would treat it as a special case. R's, JOhn ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
On Sun 17/Nov/2024 23:19:47 +0100 Bron Gondwana wrote: And if a message is bad then it's possible to derive where the badness was introduced - something not possible with DKIM or ARC if a message has been modified. I have a draft for a method at: https://datatracker.ietf.org/doc/draft-gondwana-dkim2-modification-alegbra/ My main doubt is how would a signing filter retrieve what the original message was, in order to compose the differences. Perhaps the mailing list software can attach the original message to the modified message, convening that the filter will remove the attachment before signing? Let me compare this task with the process of composing Arc-Authentication-Results:. It seems that no implementation (except mine) succeeded in retrieving the original A-R field and transfer it in the seal. Everybody puts a new ARC set at every hop. So there seems to be a hidden difficulty in saving some data from the original message. An alternative would be to not allow /any/ modification, but only a restricted set of standardized modifications that a MLM knows, so that it can configure its signing filter accordingly. That was the approach taken by a previous attempt: https://datatracker.ietf.org/doc/draft-kucherawy-dkim-transform/ I implemented it, just without the header fields that declared what transformation was made, since no one puts them in the header. One difficulty which arose was some mailing lists transforming the message body into base64. I dealt with that in a totally heuristic manner. However, I note that your algebra doesn't consider that case. My take, in case you're curious about what other approaches have been tried: https://datatracker.ietf.org/doc/html/draft-vesely-dmarc-mlm-transform I can validate my own posts when they come back from mailing lists. Some times I can validate other people posts too. Best Ale -- ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
On 18/11/2024 00:19, Bron Gondwana wrote: And I do agree there needs to be a way to say "I made changes, and I'm not telling you how to undo them" as well. This has the risk of completely nullifying the intent behind the new standard by providing a path of least resistance too many would take. At that point it might be better to rely on things like ARC and ignore DKIM2. Such (often) mangled letters should not make it back into the "wide internet" anyways. smime.p7s Description: S/MIME Cryptographic Signature ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
On Sun, Nov 17, 2024 at 2:20 PM Bron Gondwana wrote: > I don't believe it's that complex, and I do believe it's worth the effort > in exchange for being able to tell with certainty which entity (by > signature; which DNS domain) is responsible for creating each part of a > message. You can then attribute parts of the text to different entities - > the original author, or the mailing list signature. > > And if a message is bad then it's possible to derive where the badness was > introduced - something not possible with DKIM or ARC if a message has been > modified. I have a draft for a method at: > > https://datatracker.ietf.org/doc/draft-gondwana-dkim2-modification-alegbra/ > I'm very much in agreement with the need to attribute who contributed which content to the message. I think this is the key difference from the RFC6376 DKIM l= body length tag (section 3.5) that tried to tolerate mailing footer modification also, but left unknown who added a potentially malicious footer, which can be exploited today. Agreed there is DKIM RFC security section 8.2 that warns about using DKIM body length but unfortunately a very small but important part of the sending population uses it today despite those warnings and risk. My guess is that they have use cases that compel them to use DKIM body length despite its unsoundness. You can see the "algebra" draft provides tools to bind a change to a particular signer. If malicious content is found, it can be associated with the contributor and not someone else even though other parties may have forwarded the message and contributed other content to the message. There are some of the details of the content and header diff that could be tweaked but that's for later. > It can be used to describe all "add text" cases quite nicely, as well as > wrapped structures where an existing message gets moved into a > multipart/mixed with more content at the end. There's still some testing to > be done for the most complex cases - but this doesn't have to be a two-way > algorithm, is just has to allow describing how to convert a new email body > back to the original email body, and I believe this can be done reliably > and at a reasonable cost, though it could definitely use some more examples. > > I'm going to publish an update with another mechanism which reduces the > cost of the "remove an attachment" version to at least not fill the headers > with tons of junk. It doesn't reduce the message size though, because you > do need to be able to recreate the old message. > > And I do agree there needs to be a way to say "I made changes, and I'm not > telling you how to undo them" as well. > +1. My belief is that security gateways are a particularly complex case that needs this trust me bit. For example they may redact or encrypt content where it doesn't make sense to provide the original content for. When security gateways rewrite many URLs, it may become burdensome to encode and reverse, and downstream receivers are going to have to trust the gateway to make benign transformations. This can work because they already have a relationship with their security vendor. However a trust me bit in general introduces a security loophole hence receivers should use only in those limited well understood scenarios. -Wei ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org
[Ietf-dkim] Re: Should we be recording all modifications
On 11/18/24 07:19, Bron Gondwana wrote: I don't believe it's that complex, and I do believe it's worth the effort in exchange for being able to tell with certainty which entity (by signature; which DNS domain) is responsible for creating each part of a message. You can then attribute parts of the text to different entities - the original author, or the mailing list signature. While I'm not convinced /a priori/ that reversible changes will work, I do think it's worth finding out. It's come up before in similar contexts, and if we don't answer the question it will come up again. If people are motivated enough to put the engineering and analysis work into making that determination, I'd like to see the results. And to the extent $DAYJOB allows, I want to participate. Just my ¥3.09, --S. ___ Ietf-dkim mailing list -- ietf-dkim@ietf.org To unsubscribe send an email to ietf-dkim-le...@ietf.org