[Ietf-dkim] Re: Should we be recording all modifications

2024-11-27 Thread Jim Fenton
I have some thoughts on this issue, but am holding off on commenting 
until the charter is settled.


-Jim

On 27 Nov 2024, at 13:30, Bron Gondwana wrote:

I posted an updated draft for this last week with the 'z=y' case for 
"complex irreversible change".


I am interested (yes, I know - technical questions before chartered) 
in how people feel about a line-based copy format rather than just the 
character based one.  I'm thinking that because the most common 
"corruption" of emails is different line endings; and that will mess 
with character counts - and the canonicalisation for calculating body 
hashes is designed to give the same result if line endings change.


Bron.

On Mon, Nov 18, 2024, at 09:19, Bron Gondwana wrote:
I don't believe it's that complex, and I do believe it's worth the 
effort in exchange for being able to tell with certainty which entity 
(by signature; which DNS domain) is responsible for creating each 
part of a message. You can then attribute parts of the text to 
different entities - the original author, or the mailing list 
signature.


And if a message is bad then it's possible to derive where the 
badness was introduced - something not possible with DKIM or ARC if a 
message has been modified. I have a draft for a method at:


https://datatracker.ietf.org/doc/draft-gondwana-dkim2-modification-alegbra/

It can be used to describe all "add text" cases quite nicely, as well 
as wrapped structures where an existing message gets moved into a 
multipart/mixed with more content at the end. There's still some 
testing to be done for the most complex cases - but this doesn't have 
to be a two-way algorithm, is just has to allow describing how to 
convert a new email body back to the original email body, and I 
believe this can be done reliably and at a reasonable cost, though it 
could definitely use some more examples.


I'm going to publish an update with another mechanism which reduces 
the cost of the "remove an attachment" version to at least not fill 
the headers with tons of junk.  It doesn't reduce the message size 
though, because you do need to be able to recreate the old message.


And I do agree there needs to be a way to say "I made changes, and 
I'm not telling you how to undo them" as well.


Cheers,

Bron.

--
  Bron Gondwana, CEO, Fastmail Pty Ltd
  br...@fastmailteam.com




--
  Bron Gondwana, CEO, Fastmail Pty Ltd
  br...@fastmailteam.com



___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-27 Thread Bron Gondwana
I posted an updated draft for this last week with the 'z=y' case for "complex 
irreversible change". 

I am interested (yes, I know - technical questions before chartered) in how 
people feel about a line-based copy format rather than just the character based 
one.  I'm thinking that because the most common "corruption" of emails is 
different line endings; and that will mess with character counts - and the 
canonicalisation for calculating body hashes is designed to give the same 
result if line endings change.

Bron.

On Mon, Nov 18, 2024, at 09:19, Bron Gondwana wrote:
> I don't believe it's that complex, and I do believe it's worth the effort in 
> exchange for being able to tell with certainty which entity (by signature; 
> which DNS domain) is responsible for creating each part of a message. You can 
> then attribute parts of the text to different entities - the original author, 
> or the mailing list signature.
> 
> And if a message is bad then it's possible to derive where the badness was 
> introduced - something not possible with DKIM or ARC if a message has been 
> modified. I have a draft for a method at:
> 
> https://datatracker.ietf.org/doc/draft-gondwana-dkim2-modification-alegbra/
> 
> It can be used to describe all "add text" cases quite nicely, as well as 
> wrapped structures where an existing message gets moved into a 
> multipart/mixed with more content at the end. There's still some testing to 
> be done for the most complex cases - but this doesn't have to be a two-way 
> algorithm, is just has to allow describing how to convert a new email body 
> back to the original email body, and I believe this can be done reliably and 
> at a reasonable cost, though it could definitely use some more examples.
> 
> I'm going to publish an update with another mechanism which reduces the cost 
> of the "remove an attachment" version to at least not fill the headers with 
> tons of junk.  It doesn't reduce the message size though, because you do need 
> to be able to recreate the old message.
> 
> And I do agree there needs to be a way to say "I made changes, and I'm not 
> telling you how to undo them" as well.
> 
> Cheers,
> 
> Bron.
> 
> --
>   Bron Gondwana, CEO, Fastmail Pty Ltd
>   br...@fastmailteam.com
> 
> 

--
  Bron Gondwana, CEO, Fastmail Pty Ltd
  br...@fastmailteam.com

___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-21 Thread Steffen Nurpmeso
Bron Gondwana wrote in
 <8361f17f-aaf0-4f8e-a1c0-2ec99911b...@app.fastmail.com>:
 |On Tue, Nov 19, 2024, at 12:14, Steffen Nurpmeso wrote:
 |> I wondered for myself how the bsdiff algorithm would work out for
 |> such things.  
 |
 |This is basically the bsdiff algorithm, but with the syntax converted \
 |to be something human readable and header safe.
 |
 |And obviously, only applied to the message body -  headers get all \
 |sorts of trace stuff and re-ordering applied.
 |
 |Bron.
 |
 |Here's a body diff in bsdiff format for the example JMAP mailing list \
 |post on in the repo - just of the body parts:
 |
 |brong@elg:~/src/dkim2/examples$ hexdump -C o
 |  42 53 44 49 46 46 34 30  36 00 00 00 00 00 00 00  |BSDIFF406..\
 |.|
 |0010  2c 00 00 00 00 00 00 00  d3 06 00 00 00 00 00 00  |,..\
 |.|
 |0020  42 5a 68 39 31 41 59 26  53 59 02 b7 d3 b0 00 00  |BZh91AY&SY.\
 |.|
 |0030  01 c0 c2 69 14 00 10 40  00 08 00 20 00 31 06 4c  |...i...@... \
 |.1.L|
 |0040  40 d3 4d 1a 68 99 e4 2a  a0 22 39 3c 5d c9 14 e1  |@.M.h..*."9\
 |<]...|
 |0050  42 40 0a df 4e c0 42 5a  68 39 31 41 59 26 53 59  |B@..N.BZh91\
 |AY&SY|
 |0060  18 15 27 b0 00 00 03 40  02 c0 00 02 00 00 08 20  |..'@...\
 | |
 |0070  00 30 cc 08 9a 43 40 bc  5d c9 14 e1 42 40 60 54  |.0...C@.]..\
 |.B@`T|
 |0080  9e c0 42 5a 68 39 17 72  45 38 50 90 00 00 00 00  |..BZh9.rE8P\
 |.|
 |0090
 |
 |And the same as an example header:
 |
 |DKIM2-Diff-Body: i=1;
 | c=0-1747
 |
 |(and the diff with a regular text diff)
 |
 |brong@elg:~/src/dkim2/examples$ diff b a
 |1,3d0
 |< --===5385250436117681394==
 |< Content-Type: multipart/alternative; boundary=12b53dc829d24511bfa04f7d\
 |5e3675f8
 |<
 |45,58d41
 |<
 |<
 |< --===5385250436117681394==
 |< Content-Type: text/plain; charset="us-ascii"
 |< MIME-Version: 1.0
 |< Content-Transfer-Encoding: 7bit
 |< Content-Disposition: inline
 |<
 |< ___
 |< Jmap mailing list
 |< j...@ietf.org
 |< https://www.ietf.org/mailman/listinfo/jmap
 |<
 |< --===5385250436117681394==--

For clarity despite the ML silencing i really wanted to add that
you seem to have misused the bsdiff program.  Despite that its
usage say "oldfile newfile patchfile" it really is "after before
patch", and ditto bspatch "oldfile newfile patchfile" really is
"after restored patch".  One will recognize when trying to
actually restore data.  (Ie, the FreeBSD logo is a little red
daemon, .. the software is free BSD 2-clause, and really good, as
can be seen in thesis :).

DKIM now horny will use intransparent compressed base64-ified
data, but this saves quite some processing cost, and (will)
include(s) range checks etc out of the box, no errors on that
front are to be expected.  I hope i can start writing this on
Saturday, and publish in November.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself fore'er and e'er
|
|Farewell, dear collar bear

___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-21 Thread Taavi Eomäe

On 20/11/2024 15:31, Richard Clayton wrote:

Since we're meant to be discussing whether to open a WG and what it's
charter should be, should superseding ARC be specifically mentioned ?


Once again I have to disagree that DKIM2 should mix "just trust me" 
operations with "you can see what I did" operations. If there's a wish 
for a "just trust me" system, people should be using ARC instead. DKIM2 
should not supersede ARC in this aspect.


It's also highly likely that this would become a path of least 
resistance, downgrading the DKIM2 experience for many.




smime.p7s
Description: S/MIME Cryptographic Signature
___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-20 Thread Richard Clayton
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

In message <5d91cde1-6f5b-5403-1ed6-d90f3cf92...@crash.com>, Steven M
Jones  writes
>
>On 11/20/24 12:03, Richard Clayton wrote:

>> it means that if the message, for whatever reason, reaches another DKIM2
>> system it is possible to determine that the gateway intentionally
>> changed the message ... (and hence local policy is going to have to kick
>> in to decide what to do with a failing signature) otherwise one might
>> conclude that the failure of every preceding signature was some other
>> systems failure to look after the message properly -- and it might be
>> that a DSN was speciously generated (depending on the exact chain of
>> custody)
>
>So the proposition is that we would universally apply DKIM2 at the SEG 
>and verify again at the recipient ADMD/mailstore, so that if X% of 
>messages are forwarded or otherwise escape, they could be checked with 
>DKIM2 at the downstream hops, and not have to be treated as ever having 
>left the DKIM2 world, which... would mean just handling it as they do 
>today, right? Once you've left DKIM2, you fallback to the old ways of 
>doing things.

It is not necessary for the mailstore to check anything if it trusts the
security gateway ... however, out-of-the-box MTAs may well do checks so
it makes sense for the security gateway to add a DKIM2 header saying
what it has done

>I thought we were looking at a not-uncommon enterprise situation where 
>we have an adequate trust mechanism in place today without much 
>forwarding, and we're going to impose a lot of overhead for what looked 
>like not much benefit. 

adding one DKIM2 header (which says "it's complicated" to cover
modifications) does not sound like "a lot of overhead" to me. It is
certainly simpler than documenting what those changes were.

... and please note that messages that leave the DKIM2 world may re-
enter it thereafter ... you'll note that we have reconsidered our
proposal from
- -00 of our draft to -01. This is a complicated space where the trade-
offs are not immediately obvious

>But are we more thinking of large mailbox 
>providers like mobile telcos using SEGs/services, with massive 
>forwarding populations, and we're focused on their downstream impacts?

there are a fair number of people using large mailbox providers who
receive email via Proofpoint (and doubtless their competitors as well)

I have no doubt that these systems are adding ARC headers today (hoping
that they will be trusted sufficiently that "no auth no entry" will not
be a problem).

Since we're meant to be discussing whether to open a WG and what it's
charter should be, should superseding ARC be specifically mentioned ?

- -- 
richard   Richard Clayton

Those who would give up essential Liberty, to purchase a little temporary 
Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755

-BEGIN PGP SIGNATURE-
Version: PGPsdk version 1.7.1

iQA+AwUBZz3kzt2nQQHFxEViEQIjxACXZwfS+2JEN1TD3m9lt/IGi8VIGwCfR+8c
KHAR8NGfoOWupPbJzm3XMXw=
=8F1S
-END PGP SIGNATURE-

___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-19 Thread Steven M Jones


On 11/20/24 12:03, Richard Clayton wrote:

In message <58825b2d-2eb9-6306-c721-fb4b95c15...@crash.com>, Steven M
Jones  writes

> If I follow this, the use case is a Secure Email Gateway or SEG, to 
use a
> Gartner-ism, and is likely the last hop before delivery to the 
recipient ADMD or
> mailstore. So why is DKIM2's "it's complicated" flag more useful 
here than the
> configured exception for the service or gateway the receiving ADMD 
contracted

> with?

it means that if the message, for whatever reason, reaches another DKIM2
system it is possible to determine that the gateway intentionally
changed the message ... (and hence local policy is going to have to kick
in to decide what to do with a failing signature) otherwise one might
conclude that the failure of every preceding signature was some other
systems failure to look after the message properly -- and it might be
that a DSN was speciously generated (depending on the exact chain of
custody)


So the proposition is that we would universally apply DKIM2 at the SEG 
and verify again at the recipient ADMD/mailstore, so that if X% of 
messages are forwarded or otherwise escape, they could be checked with 
DKIM2 at the downstream hops, and not have to be treated as ever having 
left the DKIM2 world, which... would mean just handling it as they do 
today, right? Once you've left DKIM2, you fallback to the old ways of 
doing things.


I thought we were looking at a not-uncommon enterprise situation where 
we have an adequate trust mechanism in place today without much 
forwarding, and we're going to impose a lot of overhead for what looked 
like not much benefit. But are we more thinking of large mailbox 
providers like mobile telcos using SEGs/services, with massive 
forwarding populations, and we're focused on their downstream impacts?


--S.


___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-19 Thread Richard Clayton
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

In message <20241119211503.2_KbB0O0@steffen%sdaoden.eu>, Steffen
Nurpmeso  writes

>i responded to Richard
>Clayton's message
>
>  jszrijbynuonf...@highwayman.com
>  https://mailarchive.ietf.org/arch/msg/ietf-dkim/1ZCF-h9rHsL2YT3lTgo_qmIpVGE
>
>which seems 7-bit plain (one had to ask him how he sent it)

there's an X-Mailer to tell you that .. and a Wikipedia entry

- -- 
richard   Richard Clayton

Those who would give up essential Liberty, to purchase a little temporary 
Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755

-BEGIN PGP SIGNATURE-
Version: PGPsdk version 1.7.1

iQA/AwUBZz04md2nQQHFxEViEQL7qgCfYwH/mZ+IxNkdC7VUQaXgIej56asAoMGT
wZ7EvvUAuUpfmNxTwzS7vEFY
=sy3E
-END PGP SIGNATURE-

___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-19 Thread Richard Clayton
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

In message <58825b2d-2eb9-6306-c721-fb4b95c15...@crash.com>, Steven M
Jones  writes

>If I follow this, the use case is a Secure Email Gateway or SEG, to use a 
>Gartner-ism, and is likely the last hop before delivery to the recipient ADMD 
>or 
>mailstore. So why is DKIM2's "it's complicated" flag more useful here than the 
>configured exception for the service or gateway the receiving ADMD contracted 
>with?

it means that if the message, for whatever reason, reaches another DKIM2
system it is possible to determine that the gateway intentionally
changed the message ... (and hence local policy is going to have to kick
in to decide what to do with a failing signature) otherwise one might
conclude that the failure of every preceding signature was some other
systems failure to look after the message properly -- and it might be
that a DSN was speciously generated (depending on the exact chain of
custody)

- -- 
richard   Richard Clayton

Those who would give up essential Liberty, to purchase a little temporary 
Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755

-BEGIN PGP SIGNATURE-
Version: PGPsdk version 1.7.1

iQA/AwUBZz1Rgt2nQQHFxEViEQLxbQCgqj//0vmL3p8BevIEvJXgVwABM1sAoMAB
4H1Lz7T3uxZxpa+qVc7CG59A
=WV24
-END PGP SIGNATURE-

___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-19 Thread Richard Clayton
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

In message <20241120023021.wVeiagjR@steffen%sdaoden.eu>, Steffen Nurpmeso 
 writes

> |>i responded to Richard
> |>Clayton's message
> |>
> |>  jszrijbynuonf...@highwayman.com
> |>  https://mailarchive.ietf.org/arch/msg/ietf-dkim/1ZCF-h9rHsL2YT3lTgo_qmIp\
> |>  VGE
> |>
> |>which seems 7-bit plain (one had to ask him how he sent it)
> |
> |there's an X-Mailer to tell you that .. and a Wikipedia entry
>
>I referred to content-type and content-transfer-encoding, because
>they are changed due to reencoding at times (not seldom).

my message was as plain as they come ...


Message-ID: 
Date: Sun, 17 Nov 2024 01:42:10 +
To: ietf-dkim@ietf.org
From: Richard Clayton 
Subject: Re: [Ietf-dkim] Re: PROPOSAL: reopen this working group and work on 
DKIM2
References: <8fbb182e-cfed-422e-a2d3-a1a2ebf63...@app.fastmail.com>
 
 <1db24ccd-f67a-40d2-a4cb-cdcb6ec0c...@dcrocker.net>
 
 <8c8ea8e7-2fdc-4cf9-a4b2-7eb551523...@dcrocker.net>
 <34f44d43-09f1-4dbb-9b9e-11391a022...@tana.it>
 <20241116204935.dCb0mQcG@steffen%sdaoden.eu>
In-Reply-To: <20241116204935.dCb0mQcG@steffen%sdaoden.eu>
MIME-Version: 1.0
X-Mailer: Turnpike Integrated Version 5.03 M 

- -BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

etc

- -- 
richard @ highwayman . com   "Nothing seems the same
  Still you never see the change from day to day
And no-one notices the customs slip away"

-BEGIN PGP SIGNATURE-
Version: PGPsdk version 1.7.1

iQA/AwUBZz1P192nQQHFxEViEQL6GQCfSDkObxNYfPK/26ShCAy9NSl3pIsAnRmF
1cydh4lUyxGkXxdFY8UCChWw
=Zjkw
-END PGP SIGNATURE-

___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-19 Thread Steven M Jones

On 11/19/24 00:25, Wei Chuang wrote:


On Sun, Nov 17, 2024 at 2:20 PM Bron Gondwana 
 wrote:


And I do agree there needs to be a way to say "I made changes, and
I'm not telling you how to undo them" as well.


+1.  My belief is that security gateways are a particularly complex 
case that needs this trust me bit. For example they may redact or 
encrypt content where it doesn't make sense to provide the original 
content for. When security gateways rewrite many URLs, it may become 
burdensome to encode and reverse, and downstream receivers are going 
to have to trust the gateway to make benign transformations.  This can 
work because they already have a relationship with their security 
vendor. However a trust me bit in general introduces a security 
loophole hence receivers should use only in those limited well 
understood scenarios.



If I follow this, the use case is a Secure Email Gateway or SEG, to use 
a Gartner-ism, and is likely the last hop before delivery to the 
recipient ADMD or mailstore. So why is DKIM2's "it's complicated" flag 
more useful here than the configured exception for the service or 
gateway the receiving ADMD contracted with?


I believe there are already many sites that are configured to accept 
whatever their SEG/service is doing based on IP range, TLS certificate, 
etc. What makes this juice worth the squeeze?


--S.

___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-19 Thread Steffen Nurpmeso
Bron Gondwana wrote in
 <62b953a7-5168-45ab-8bec-f77c709b9...@app.fastmail.com>:
 |On Wed, Nov 20, 2024, at 08:15, Steffen Nurpmeso wrote:
 |> that goes out without MIME as such (text/plain 7-bit content-type
 |> is optional), but both of these two messages came in via ML as
 |> 
 |>   Content-Type: text/plain; charset="utf-8"
 |>   Content-Transfer-Encoding: base64
 |
 |Yeah, if the source message isn't MIME encoded, Mailman re-encodes. \

It is more than that.

 | It's a "detect message type" flag in the code, and it would be trivial \
 |to add a config "don't do that if DKIM2" and instead just MIME-wrap \
 |the existing message with the existing charset.

  ...
 |>   -rw-r-   1 steffen wheel 2167 Nov 19 21:22 t1-i.txt
 |>   -rw-r-   1 steffen wheel 2201 Nov 19 21:22 t1-o.txt
 |>   -rw---   1 steffen wheel  236 Nov 19 21:22 t1-patch
 |>   -rw-r-   1 steffen wheel 8412 Nov 19 21:22 t2-i.txt
 |>   -rw-r-   1 steffen wheel 5932 Nov 19 21:22 t2-o.txt
 |>   -rw---   1 steffen wheel 4350 Nov 19 21:23 t2-patch
 |> 
 |> Hm.  Ok let me remove the bzip2 stuff from bsdiff..  Here is the
 |> same without, and then running plzip and zstd on the uncompressed
 |> binary data; this still has the normal header and such (note
 |> i have not yet looked at all, it may very well be that patches at
 |> position 0 or "EOT" could be optimized away etc etc.
 |> 
 |>   plzip -9 and zstd -19
 |> 
 |>   -rw---   1 steffen wheel  142 Nov 19 21:48 t1-patch-2.lz
 |>   -rw---   1 steffen wheel  116 Nov 19 21:48 t1-patch-2.zst
 |> 
 |>   -rw---   1 steffen wheel 4654 Nov 19 21:48 t2-patch-2.lz
 |>   -rw---   1 steffen wheel 4577 Nov 19 21:48 t2-patch-2.zst
 |> 
 |> It would be interesting to know how your implementation of the
 |> algorithm works out for those (and the "real" vcsdiff
 |> implementation i have seen is huge).  Would be cool if it is
 |> superior, of course.
 |
 |My code uses a pretty basic perl diffing tool, but we could use vcsdiff \
 |just fine too - and have it be an input to that format.  The format \
 |really is basically just the logic from RFC3284; but encoded to be \
 |readable.

Ok i now downloaded xdelta3 which uses the VCDIFF algorithm (like
Google's really big thing open-vcdiff), and i see i get for t1

  Offset Code Type1 Size1  @Addr1 + Type2 Size2 @Addr2
  00 019  CPY_0 54 S@0
  54 002  ADD1
  55 034  CPY_0 18 S@59
  73 003  ADD2
  75 019  CPY_0 27 S@83
  000102 019  CPY_0196 S@112
  000298 107  CPY_5 11 S@310
  000309 051  CPY_2 53 S@323
  000362 007  ADD6
  000368 051  CPY_2 45 S@386
  000413 051  CPY_2111 S@433
  000524 099  CPY_5250 S@546
  000774 035  CPY_1 21 T@309
  000795 014  ADD   13
  000808 069  CPY_3  5 T@362
  000813 003  ADD2
  000815 051  CPY_2 38 S@843
  000853 099  CPY_5238 S@883
  001091 003  ADD2
  001093 051  CPY_2   1074 S@1127

so i wildly guess you actually postprocess this output (for now).
The two examples i had posted are smaller when processed with
bsdiff compared to non-postprocessed VCDIFF, that much is plain.

But thank you!
Ciao,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself fore'er and e'er
|
|Farewell, dear collar bear

___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-19 Thread Bron Gondwana
On Wed, Nov 20, 2024, at 08:15, Steffen Nurpmeso wrote:
> that goes out without MIME as such (text/plain 7-bit content-type
> is optional), but both of these two messages came in via ML as
> 
>   Content-Type: text/plain; charset="utf-8"
>   Content-Transfer-Encoding: base64

Yeah, if the source message isn't MIME encoded, Mailman re-encodes.  It's a 
"detect message type" flag in the code, and it would be trivial to add a config 
"don't do that if DKIM2" and instead just MIME-wrap the existing message with 
the existing charset.

> And here the complete text need to be replaced.  This is t2*.txt.
> And *if* (just thinking) that is only a forwarder address like the
> @FreeBSD.org one of Colin Percival, of CPAN, sourceforge or such
> kind of, then *possibly* (but likely not) once again.
> 
> Things are better (for me as a German who effectively writes
> mostly 7-bit ASCII) for the mentioned OpenGroup server, where you
> sent eg text/plain; charset="utf-8"/quoted-printable (because of
> a MIME-folded long line, and a German name with Umlauts etc) and
> only get the 8-bit conversion.
> 
> One more question: how about a language which practically always
> needs UTF-8 with more than one byte per character, ie, an Asian
> language, and such?  For anyone not going the 8-bit way (like
> myself) this is thus either quoted-printable or base64 right away.
> Then reencodings to 8-bit are more expensive.
> 
> Well i do not know, it would have to be tested on real life data;
> of course one could hope for the future, if it is all 8-bit and
> if ML software and such stops this reencoding "madness", then..
> 
> And, of course, all this pretty much only affects the text parts,
> large images and such are base64 data and (pretty much) constant.
> 
> My examples from above, if i pass only the bodies (i will attach
> them) to bsdiff i get
> 
>   -rw-r-   1 steffen wheel 2167 Nov 19 21:22 t1-i.txt
>   -rw-r-   1 steffen wheel 2201 Nov 19 21:22 t1-o.txt
>   -rw---   1 steffen wheel  236 Nov 19 21:22 t1-patch
>   -rw-r-   1 steffen wheel 8412 Nov 19 21:22 t2-i.txt
>   -rw-r-   1 steffen wheel 5932 Nov 19 21:22 t2-o.txt
>   -rw---   1 steffen wheel 4350 Nov 19 21:23 t2-patch
> 
> Hm.  Ok let me remove the bzip2 stuff from bsdiff..  Here is the
> same without, and then running plzip and zstd on the uncompressed
> binary data; this still has the normal header and such (note
> i have not yet looked at all, it may very well be that patches at
> position 0 or "EOT" could be optimized away etc etc.
> 
>   plzip -9 and zstd -19
> 
>   -rw---   1 steffen wheel  142 Nov 19 21:48 t1-patch-2.lz
>   -rw---   1 steffen wheel  116 Nov 19 21:48 t1-patch-2.zst
> 
>   -rw---   1 steffen wheel 4654 Nov 19 21:48 t2-patch-2.lz
>   -rw---   1 steffen wheel 4577 Nov 19 21:48 t2-patch-2.zst
> 
> It would be interesting to know how your implementation of the
> algorithm works out for those (and the "real" vcsdiff
> implementation i have seen is huge).  Would be cool if it is
> superior, of course.

My code uses a pretty basic perl diffing tool, but we could use vcsdiff just 
fine too - and have it be an input to that format.  The format really is 
basically just the logic from RFC3284; but encoded to be readable.

>From RFC3284 there are 3 commands:

The instructions to encode and direct the reconstruction of a target
   window are called delta instructions.  There are three types:

  ADD:  This instruction has two arguments, a size x and a sequence
of x bytes to be copied.

  COPY: This instruction has two arguments, a size x and an address
p in the string U.  The arguments specify the substring of U
that must be copied.  We shall assert that such a substring
must be entirely contained in either S or T.

  RUN:  This instruction has two arguments, a size x and a byte b,
that will be repeated x times.

I didn't bother implementing "RUN" because that seems like something that you 
don't realistically need in emails.  For headers I implemented both plaintext 
"ADD" and base64 ADD to allow encoding everything neatly.

The only other thing I'm thinking is whether a base64 decoding version of COPY 
would make sense for the body.  This would allow putting phrases into the MIME 
preamble rather than into an ADD command and keep the DKIM2-Body-Diff header 
short.  maybe "Diff" is the wrong name and I should rename it to Delta - which 
is the naming in the VCDIFF doc.

Bron.

> 
> You know, .. the "DKIM now horny" draft i will write anyway
> (because why not, it only extends DKIM/6376) will include diffing,
> it will state that normalized headers shall come first, followed
> by normalized body, all this to be diffed and optionally
> compressed (but decompressing MUST be supported; just today
> Antonio Diaz Diaz posted "Lunzip 1.15-rc1 released", very small
> decompressor only).
> Then, if additional headers are to be included these have to be
> prepend

[Ietf-dkim] Re: Should we be recording all modifications

2024-11-19 Thread Steffen Nurpmeso
Hello.

Bron Gondwana wrote in
 <8361f17f-aaf0-4f8e-a1c0-2ec99911b...@app.fastmail.com>:
 |On Tue, Nov 19, 2024, at 12:14, Steffen Nurpmeso wrote:
 |> I wondered for myself how the bsdiff algorithm would work out for
 |> such things.  
 |
 |This is basically the bsdiff algorithm, but with the syntax converted \
 |to be something human readable and header safe.

Thank.  Ok, in this case as you show looks very impressive.  The
algorithm "does not need to patch" several, as you say.  Also your
message in my inbox and what came over IETF would be addressable
like so.

But, this is my data point here, it will not work out without full
diffs in many other cases.  For example i responded to Richard
Clayton's message

  jszrijbynuonf...@highwayman.com
  https://mailarchive.ietf.org/arch/msg/ietf-dkim/1ZCF-h9rHsL2YT3lTgo_qmIpVGE

which seems 7-bit plain (one had to ask him how he sent it) with

  20241117030640.wk6r9c7R@steffen%sdaoden.eu
  https://mailarchive.ietf.org/arch/msg/ietf-dkim/k4wUEwxJLI_AIU-Q9TXBBhI0gPU

that goes out without MIME as such (text/plain 7-bit content-type
is optional), but both of these two messages came in via ML as

  Content-Type: text/plain; charset="utf-8"
  Content-Transfer-Encoding: base64

And here the complete text need to be replaced.  This is t2*.txt.
And *if* (just thinking) that is only a forwarder address like the
@FreeBSD.org one of Colin Percival, of CPAN, sourceforge or such
kind of, then *possibly* (but likely not) once again.

Things are better (for me as a German who effectively writes
mostly 7-bit ASCII) for the mentioned OpenGroup server, where you
sent eg text/plain; charset="utf-8"/quoted-printable (because of
a MIME-folded long line, and a German name with Umlauts etc) and
only get the 8-bit conversion.

One more question: how about a language which practically always
needs UTF-8 with more than one byte per character, ie, an Asian
language, and such?  For anyone not going the 8-bit way (like
myself) this is thus either quoted-printable or base64 right away.
Then reencodings to 8-bit are more expensive.

Well i do not know, it would have to be tested on real life data;
of course one could hope for the future, if it is all 8-bit and
if ML software and such stops this reencoding "madness", then..

And, of course, all this pretty much only affects the text parts,
large images and such are base64 data and (pretty much) constant.

My examples from above, if i pass only the bodies (i will attach
them) to bsdiff i get

  -rw-r-   1 steffen wheel 2167 Nov 19 21:22 t1-i.txt
  -rw-r-   1 steffen wheel 2201 Nov 19 21:22 t1-o.txt
  -rw---   1 steffen wheel  236 Nov 19 21:22 t1-patch
  -rw-r-   1 steffen wheel 8412 Nov 19 21:22 t2-i.txt
  -rw-r-   1 steffen wheel 5932 Nov 19 21:22 t2-o.txt
  -rw---   1 steffen wheel 4350 Nov 19 21:23 t2-patch

Hm.  Ok let me remove the bzip2 stuff from bsdiff..  Here is the
same without, and then running plzip and zstd on the uncompressed
binary data; this still has the normal header and such (note
i have not yet looked at all, it may very well be that patches at
position 0 or "EOT" could be optimized away etc etc.

  plzip -9 and zstd -19

  -rw---   1 steffen wheel  142 Nov 19 21:48 t1-patch-2.lz
  -rw---   1 steffen wheel  116 Nov 19 21:48 t1-patch-2.zst

  -rw---   1 steffen wheel 4654 Nov 19 21:48 t2-patch-2.lz
  -rw---   1 steffen wheel 4577 Nov 19 21:48 t2-patch-2.zst

It would be interesting to know how your implementation of the
algorithm works out for those (and the "real" vcsdiff
implementation i have seen is huge).  Would be cool if it is
superior, of course.


You know, .. the "DKIM now horny" draft i will write anyway
(because why not, it only extends DKIM/6376) will include diffing,
it will state that normalized headers shall come first, followed
by normalized body, all this to be diffed and optionally
compressed (but decompressing MUST be supported; just today
Antonio Diaz Diaz posted "Lunzip 1.15-rc1 released", very small
decompressor only).
Then, if additional headers are to be included these have to be
prepended, like trace headers for an email; maybe that special
case can be optimized away very easily (from bsdiff .. for now).

Regarding licenses these are BSD 2-clause, MIT, and i think lzip
is available as public domain (despite the IETF draft variant).
The nice thing about all that long time matured software is that
it is very small, statically linking them all in is no problem; on
FreeBSD:

  -rw---  1 steffen wheel 19992 Nov 19 21:58 bsdiff.o
  -rw---  1 steffen wheel 14904 Nov 19 21:58 divsufsort.o
  -rw---  1 steffen wheel 43928 Nov 19 21:58 sssort.o
  -rw---  1 steffen wheel 32848 Nov 19 21:58 trsort.o
  -rw---  1 steffen wheel 19000 Nov 19 21:58 utils.o
  #|f-1400:/tmp/z$ ll bsdiff
  -rwx--  1 steffen wheel 49200 Nov 19 21:58 bsdiff*
  #|f-1400:/tmp/z$ strip bsdiff
  #|f-1400:/tmp/z$ ll bsdiff
  -rwx--  1 steffen wheel 46200 Nov 19 21:58 bsdiff*

and 

[Ietf-dkim] Re: Should we be recording all modifications

2024-11-18 Thread Bron Gondwana


On Tue, Nov 19, 2024, at 12:14, Steffen Nurpmeso wrote:
> I wondered for myself how the bsdiff algorithm would work out for
> such things.  

This is basically the bsdiff algorithm, but with the syntax converted to be 
something human readable and header safe.

And obviously, only applied to the message body -  headers get all sorts of 
trace stuff and re-ordering applied.

Bron.

Here's a body diff in bsdiff format for the example JMAP mailing list post on 
in the repo - just of the body parts:

brong@elg:~/src/dkim2/examples$ hexdump -C o
  42 53 44 49 46 46 34 30  36 00 00 00 00 00 00 00  |BSDIFF406...|
0010  2c 00 00 00 00 00 00 00  d3 06 00 00 00 00 00 00  |,...|
0020  42 5a 68 39 31 41 59 26  53 59 02 b7 d3 b0 00 00  |BZh91AY&SY..|
0030  01 c0 c2 69 14 00 10 40  00 08 00 20 00 31 06 4c  |...i...@... .1.L|
0040  40 d3 4d 1a 68 99 e4 2a  a0 22 39 3c 5d c9 14 e1  |@.M.h..*."9<]...|
0050  42 40 0a df 4e c0 42 5a  68 39 31 41 59 26 53 59  |B@..N.BZh91AY&SY|
0060  18 15 27 b0 00 00 03 40  02 c0 00 02 00 00 08 20  |..'@... |
0070  00 30 cc 08 9a 43 40 bc  5d c9 14 e1 42 40 60 54  |.0...C@.]...B@`T|
0080  9e c0 42 5a 68 39 17 72  45 38 50 90 00 00 00 00  |..BZh9.rE8P.|
0090

And the same as an example header:

DKIM2-Diff-Body: i=1;
 c=0-1747

(and the diff with a regular text diff)

brong@elg:~/src/dkim2/examples$ diff b a
1,3d0
< --===5385250436117681394==
< Content-Type: multipart/alternative; boundary=12b53dc829d24511bfa04f7d5e3675f8
<
45,58d41
<
<
< --===5385250436117681394==
< Content-Type: text/plain; charset="us-ascii"
< MIME-Version: 1.0
< Content-Transfer-Encoding: 7bit
< Content-Disposition: inline
<
< ___
< Jmap mailing list
< j...@ietf.org
< https://www.ietf.org/mailman/listinfo/jmap
<
< --===5385250436117681394==--

Bron.

--
  Bron Gondwana, CEO, Fastmail Pty Ltd
  br...@fastmailteam.com

___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-18 Thread Steffen Nurpmeso
Bron Gondwana wrote in
 <71d3b35b-e9e1-43bd-a6ab-d0cb26152...@app.fastmail.com>:
 ...
 |[.] I have a draft for a method at:
 |
 |https://datatracker.ietf.org/doc/draft-gondwana-dkim2-modification-alegbra/
 |
 |It can be used to describe all "add text" cases quite nicely, as well \
 |as wrapped structures where an existing message gets moved into a multip\
 |art/mixed with more content at the end. There's still some testing \
 |to be done for the most complex cases - but this doesn't have to be \
 |a two-way algorithm, is just has to allow describing how to convert \
 |a new email body back to the original email body, and I believe this \
 |can be done reliably and at a reasonable cost, though it could definitely \
 |use some more examples.
 |
 |I'm going to publish an update with another mechanism which reduces \
 |the cost of the "remove an attachment" version to at least not fill \
 |the headers with tons of junk.  It doesn't reduce the message size \
 |though, because you do need to be able to recreate the old message.

I wondered for myself how the bsdiff algorithm would work out for
such things.  This is a very old program present in any FreeBSD
system since twenty years and more.  The executable is all in all
~8.5KB, and it uses the libdivsufsort library (the source of)
which is 72 KB all in all.  This executable by default compresses
via bzip2 (which makes a bit of the 8.5 KB).

For example if i strip the content of the HTML part of your
message, then removing the IETF ML attachment diff(1)

  -rw---  1 steffen wheel 5295 Nov 19 01:50 m0
  -rw---  1 steffen wheel 4929 Nov 19 01:50 m1

  --- m0  2024-11-19 01:50:20.390006000 +0100
  +++ m1  2024-11-19 01:50:32.441447000 +0100
  @@ -87,16 +87,6 @@ Content-Transfer-Encoding: quoted-printable
   Content-Type: text/html
   Content-Transfer-Encoding: quoted-printable

  ---===5952072662436684613==
  -Content-Type: text/plain; charset="utf-8"
  -MIME-Version: 1.0
  -Content-Transfer-Encoding: base64
  -Content-Disposition: inline
  -
  -X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSWV0Zi1ka2lt
  -IG1haWxpbmcgbGlzdCAtLSBpZXRmLWRraW1AaWV0Zi5vcmcKVG8gdW5zdWJzY3JpYmUgc2VuZCBh
  -biBlbWFpbCB0byBpZXRmLWRraW0tbGVhdmVAaWV0Zi5vcmcK
  -
   --===5952072662436684613==--

via bsdiff results in a 168 byte file.
This can be changed of course as this file is identifieable

  # file yy
  yy: bsdiff(1) patch file

and the IETF drafted algorithm lzip is very good with such text
(much better than the much larger (factor ten) RFCd zstd).
The author of the algorithm is a decade long FreeBSD+ developer
and had written his Oxford thesis based on this topic:

  http://www.daemonology.net/papers/thesis.pdf

'Must be said that the memory cost of this thing is

   The bsdiff utility uses memory equal to 17 times the size of
   oldfile, and requires an absolute minimum working set size of
   8 times the size of oldfile.

which is quite a bit with those new-style HTML emails with lots of
too-large-a-snapshot images.

I want to point out that, as can be seen above, especially the
Mailman(3) ML software, or let's say, especially the Python stuff,
has a favour of reencoding anything in base64.  Whereas others,
for example the one used by the OpenGroup, has the pesky quirk of
reencoding to 8-bit -- even if that means that "From "quoting
"has" to be applied.
This effectively means that the differences after mangling of such
things like mailing-list managers will, at the current state of
affairs, be larger than what i would expect from reading what was
said on that diffing topic.
At least today, i always hated it, and maybe if people like you,
Mr. Levine and others speak to maintainers of MIME aware
mailing-list managers, things will change over time.

  ..

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself fore'er and e'er
|
|Farewell, dear collar bear

___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-18 Thread Richard Clayton
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

In message <20241118210924.4dff9a743...@ary.qy>, John Levine
 writes

>Right.  We need to make it clear that the "trust me" bit is only intended
>for mail from gateways with whom you already have a relationship.

I have been calling the bit "it's complicated" since it would only need
to be used when it was impractical to describe the change (or as Wei
suggests, it would leak confidential information).

... it does however mean that anyone receiving the email thereafter will
need to have trust (or a devil-may-care local policy).

It's unlikely that forwarding such email outside of the receiving
organisation is going to succeed because the trust will evaporate.

- -- 
richard   Richard Clayton

Those who would give up essential Liberty, to purchase a little temporary 
Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755

-BEGIN PGP SIGNATURE-
Version: PGPsdk version 1.7.1

iQA/AwUBZzvAzt2nQQHFxEViEQJoBACfcgPULoRmuXpwzyv9NB5U3dpzEXkAn1vk
9T1mw8z9CcuYmj38Z6Qjdrgy
=+zXS
-END PGP SIGNATURE-

___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-18 Thread John Levine
It appears that Wei Chuang   said:
>I'm very much in agreement with the need to attribute who contributed which
>content to the message.  I think this is the key difference from the
>RFC6376 DKIM l= body length tag (section 3.5) that tried to tolerate
>mailing footer modification also, but left unknown who added a potentially
>malicious footer, which can be exploited today.  Agreed there is DKIM RFC
>security section 8.2 that warns about using DKIM body length but
>unfortunately a very small but important part of the sending population
>uses it today despite those warnings and risk.  My guess is that they have
>use cases that compel them to use DKIM body length despite its
>unsoundness. 

When this came up a few months ago, I think we found that the people we
found using l= didn't understand it.  There was an ESP (bulk mailer) that
was doing l=0 who promptly stopped when someone told them why it was a bad
idea, and I think there was some list software that was putting l= with the
actual length as a poorly chosen default.

This is quite different from the algebra, since l= gave you no reliable
way to tell who might have added what.

>+1.  My belief is that security gateways are a particularly complex case
>that needs this trust me bit.  For example they may redact or
>encrypt content where it doesn't make sense to provide the original content
>for.  When security gateways rewrite many URLs, it may become burdensome to
>encode and reverse, and downstream receivers are going to have to trust
>the gateway to make benign transformations.  This can work because they
>already have a relationship with their security vendor.  However a trust me
>bit in general introduces a security loophole hence receivers should use
>only in those limited well understood scenarios.

Right.  We need to make it clear that the "trust me" bit is only intended
for mail from gateways with whom you already have a relationship.

R's,
John

___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-18 Thread John Levine
It appears that Taavi Eomäe  said:
>-=-=-=-=-=-
>-=-=-=-=-=-
>On 18/11/2024 00:19, Bron Gondwana wrote:
>> And I do agree there needs to be a way to say "I made changes, and I'm 
>> not telling you how to undo them" as well.
>
>This has the risk of completely nullifying the intent behind the new 
>standard by providing a path of least resistance too many would take. 

You'd only accept that kind of mail from someone for whom you have an external
reason to assume they're benign. The obvious example is Proofpoint's filtering
proxy which rewrites all the URLs in a message, but only send it to their
customers who would treat it as a special case.

R's,
JOhn

___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-18 Thread Alessandro Vesely

On Sun 17/Nov/2024 23:19:47 +0100 Bron Gondwana wrote:


And if a message is bad then it's possible to derive where the badness was 
introduced - something not possible with DKIM or ARC if a message has been 
modified. I have a draft for a method at:


https://datatracker.ietf.org/doc/draft-gondwana-dkim2-modification-alegbra/ 



My main doubt is how would a signing filter retrieve what the original message 
was, in order to compose the differences.  Perhaps the mailing list software 
can attach the original message to the modified message, convening that the 
filter will remove the attachment before signing?  Let me compare this task 
with the process of composing Arc-Authentication-Results:.  It seems that no 
implementation (except mine) succeeded in retrieving the original A-R field and 
transfer it in the seal.  Everybody puts a new ARC set at every hop.  So there 
seems to be a hidden difficulty in saving some data from the original message.


An alternative would be to not allow /any/ modification, but only a restricted 
set of standardized modifications that a MLM knows, so that it can configure 
its signing filter accordingly.  That was the approach taken by a previous attempt:


https://datatracker.ietf.org/doc/draft-kucherawy-dkim-transform/

I implemented it, just without the header fields that declared what 
transformation was made, since no one puts them in the header.  One difficulty 
which arose was some mailing lists transforming the message body into base64. 
I dealt with that in a totally heuristic manner.  However, I note that your 
algebra doesn't consider that case.


My take, in case you're curious about what other approaches have been tried:

https://datatracker.ietf.org/doc/html/draft-vesely-dmarc-mlm-transform

I can validate my own posts when they come back from mailing lists.  Some times 
I can validate other people posts too.



Best
Ale
--






___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-18 Thread Taavi Eomäe

On 18/11/2024 00:19, Bron Gondwana wrote:
And I do agree there needs to be a way to say "I made changes, and I'm 
not telling you how to undo them" as well.


This has the risk of completely nullifying the intent behind the new 
standard by providing a path of least resistance too many would take. At 
that point it might be better to rely on things like ARC and ignore 
DKIM2. Such (often) mangled letters should not make it back into the 
"wide internet" anyways.




smime.p7s
Description: S/MIME Cryptographic Signature
___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-18 Thread Wei Chuang
On Sun, Nov 17, 2024 at 2:20 PM Bron Gondwana  wrote:

> I don't believe it's that complex, and I do believe it's worth the effort
> in exchange for being able to tell with certainty which entity (by
> signature; which DNS domain) is responsible for creating each part of a
> message. You can then attribute parts of the text to different entities -
> the original author, or the mailing list signature.
>
> And if a message is bad then it's possible to derive where the badness was
> introduced - something not possible with DKIM or ARC if a message has been
> modified. I have a draft for a method at:
>
> https://datatracker.ietf.org/doc/draft-gondwana-dkim2-modification-alegbra/
>

I'm very much in agreement with the need to attribute who contributed which
content to the message.  I think this is the key difference from the
RFC6376 DKIM l= body length tag (section 3.5) that tried to tolerate
mailing footer modification also, but left unknown who added a potentially
malicious footer, which can be exploited today.  Agreed there is DKIM RFC
security section 8.2 that warns about using DKIM body length but
unfortunately a very small but important part of the sending population
uses it today despite those warnings and risk.  My guess is that they have
use cases that compel them to use DKIM body length despite its
unsoundness.  You can see the "algebra" draft provides tools to bind a
change to a particular signer.  If malicious content is found, it can be
associated with the contributor and not someone else even though other
parties may have forwarded the message and contributed other content to the
message.

There are some of the details of the content and header diff that could be
tweaked but that's for later.


> It can be used to describe all "add text" cases quite nicely, as well as
> wrapped structures where an existing message gets moved into a
> multipart/mixed with more content at the end. There's still some testing to
> be done for the most complex cases - but this doesn't have to be a two-way
> algorithm, is just has to allow describing how to convert a new email body
> back to the original email body, and I believe this can be done reliably
> and at a reasonable cost, though it could definitely use some more examples.
>
> I'm going to publish an update with another mechanism which reduces the
> cost of the "remove an attachment" version to at least not fill the headers
> with tons of junk.  It doesn't reduce the message size though, because you
> do need to be able to recreate the old message.
>
> And I do agree there needs to be a way to say "I made changes, and I'm not
> telling you how to undo them" as well.
>

+1.  My belief is that security gateways are a particularly complex case
that needs this trust me bit.  For example they may redact or
encrypt content where it doesn't make sense to provide the original content
for.  When security gateways rewrite many URLs, it may become burdensome to
encode and reverse, and downstream receivers are going to have to trust
the gateway to make benign transformations.  This can work because they
already have a relationship with their security vendor.  However a trust me
bit in general introduces a security loophole hence receivers should use
only in those limited well understood scenarios.

-Wei
___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org


[Ietf-dkim] Re: Should we be recording all modifications

2024-11-18 Thread Steven M Jones

On 11/18/24 07:19, Bron Gondwana wrote:
I don't believe it's that complex, and I do believe it's worth the 
effort in exchange for being able to tell with certainty which entity 
(by signature; which DNS domain) is responsible for creating each part 
of a message. You can then attribute parts of the text to different 
entities - the original author, or the mailing list signature.



While I'm not convinced /a priori/ that reversible changes will work, I 
do think it's worth finding out. It's come up before in similar 
contexts, and if we don't answer the question it will come up again.


If people are motivated enough to put the engineering and analysis work 
into making that determination, I'd like to see the results. And to the 
extent $DAYJOB allows, I want to participate.


Just my ¥3.09,
--S.

___
Ietf-dkim mailing list -- ietf-dkim@ietf.org
To unsubscribe send an email to ietf-dkim-le...@ietf.org