interaction between --format=raw and multipart handling [was: Re: Do not attept to output part raw if part is not GMimePart.]

2011-06-27 Thread Austin Clements
On Mon, Jun 27, 2011 at 6:41 PM, Daniel Kahn Gillmor
 wrote:
> On 06/27/2011 06:07 PM, Austin Clements wrote:
>> Oh, right, of course. ?show_message_part will walk into the parts, so
>> format_part_content_raw will still be called on the leafs of a
>> requested multipart. ?Though, this approach results in each leaf being
>> transfer decoded and printed individually, so if you ask for a
>> multipart, you won't get the "raw" contents of the multipart (unless
>> it's part 0), so much as you get the concatenated "raw" contents of
>> each part in the multipart.
>
>
> let's take two labeled examples:
>
> A???multipart/signed 58292 bytes
> B ???multipart/mixed 56553 bytes
> C ???text/plain 1278 bytes
> D ???text/plain attachment [grub-install.out] 54109 bytes
> E ???text/x-diff attachment [597538.patch] 496 bytes
> F ??application/pgp-signature attachment [signature.asc] 900 bytes
>
>
> X???multipart/signed 3863 bytes
> Y ??text/plain 1857 bytes
> Z ??application/pgp-signature attachment [signature.asc] 900 bytes
>
> (i know, you won't use "A" or "Z" as part IDs once we have hierarchical
> part numbers, but consider them placeholders).
>
> if parts F or Z are ever going to be useful (e.g. to some external
> process that wants to validate the signature by hand), then the tool
> needs to provide some way of producing parts B and Y in a pristine form
> (that is, including MIME headers and without interpreting/applying any
> transfer encodings).
>
> Perhaps this means there are two flavors of "raw" that we should be
> distinguishing, something like:
>
> ?0) "source" -- the equivalent to viewing the source of the message,
> with headers and without attempting to reverse transfer-encodings, etc.
>
> ?1) "rare" -- (not entirely raw, but still bloody, ha ha) strip headers,
> reverse transfer encodings, etc.
>
> I think our current implementation of --format=raw emits "source" when
> applied to the entire message, but "rare" when applied to one of the parts.

Yes.

> I'm suggesting that it might be useful to be able to get "source" of a
> part. ?(and perhaps it might also be useful to get the whole message
> "rare" sometimes?)
>
> My first instinct was: if it's multipart, provide "source", if it's
> single-part, provide "rare". ?But that fails for the XYZ case above --
> we'd need Y (which is single-part) to be provided as "source" if we were
> ever to be able to make use of Z on its own, so i don't think it'll be
> that simple.
>
> OTOH, i'm not sure that "rare" is particularly meaningful for non-leaf
> parts.
>
>> That if you ask for a multipart, you should effectively get a slice
>> out of the original message bytes (since multipart/* parts can't have
>> non-identity transfer encodings). ?Are you also saying that should
>> extend to transfer encoded leaf parts, too?
>
> hmm. ?is it true that multipart/* parts can't have non-identity transfer
> encodings? ?that would simplify some things, but i don't have a
> reference handy that says it's the case.

RFC 2045, section 6.4: "If an entity is of type "multipart" the
Content-Transfer-Encoding is not permitted to have any value other
than "7bit", "8bit" or "binary"."  (And, for completeness, section
6.2: "The Content-Transfer-Encoding values "7bit", "8bit", and
"binary" all mean that the identity (i.e. NO) encoding transformation
has been  performed.")

> At any rate, i'm not sure it affects the need for being able to emit
> both "rare" and "source" forms of at least the leaf (non-multipart) parts.
>
> i hope this is all at least somewhat clarifying and not just adding to
> the confusion,

Thanks.  That's actually very informative and solidifies some of
what's been slowly coagulating in my mind.

I was also thinking about the two output variants you describe
(though, being less clever, I was thinking "raw" and "decoded").  The
fact that multipart/* parts can only have identity encodings makes me
wonder if the two could be merged by thinking of the decoded content
of a leaf part as a child/body to the original, encoded part.  On the
other hand, that doesn't make sense for other formats, so perhaps
that's not a fruitful approach.


interaction between --format=raw and multipart handling [was: Re: Do not attept to output part raw if part is not GMimePart.]

2011-06-27 Thread Daniel Kahn Gillmor
On 06/27/2011 06:07 PM, Austin Clements wrote:
> Oh, right, of course.  show_message_part will walk into the parts, so
> format_part_content_raw will still be called on the leafs of a
> requested multipart.  Though, this approach results in each leaf being
> transfer decoded and printed individually, so if you ask for a
> multipart, you won't get the "raw" contents of the multipart (unless
> it's part 0), so much as you get the concatenated "raw" contents of
> each part in the multipart.


let's take two labeled examples:

A???multipart/signed 58292 bytes
B ???multipart/mixed 56553 bytes
C ???text/plain 1278 bytes
D ???text/plain attachment [grub-install.out] 54109 bytes
E ???text/x-diff attachment [597538.patch] 496 bytes
F ??application/pgp-signature attachment [signature.asc] 900 bytes


X???multipart/signed 3863 bytes
Y ??text/plain 1857 bytes
Z ??application/pgp-signature attachment [signature.asc] 900 bytes

(i know, you won't use "A" or "Z" as part IDs once we have hierarchical
part numbers, but consider them placeholders).

if parts F or Z are ever going to be useful (e.g. to some external
process that wants to validate the signature by hand), then the tool
needs to provide some way of producing parts B and Y in a pristine form
(that is, including MIME headers and without interpreting/applying any
transfer encodings).

Perhaps this means there are two flavors of "raw" that we should be
distinguishing, something like:

 0) "source" -- the equivalent to viewing the source of the message,
with headers and without attempting to reverse transfer-encodings, etc.

 1) "rare" -- (not entirely raw, but still bloody, ha ha) strip headers,
reverse transfer encodings, etc.

I think our current implementation of --format=raw emits "source" when
applied to the entire message, but "rare" when applied to one of the parts.

I'm suggesting that it might be useful to be able to get "source" of a
part.  (and perhaps it might also be useful to get the whole message
"rare" sometimes?)

My first instinct was: if it's multipart, provide "source", if it's
single-part, provide "rare".  But that fails for the XYZ case above --
we'd need Y (which is single-part) to be provided as "source" if we were
ever to be able to make use of Z on its own, so i don't think it'll be
that simple.

OTOH, i'm not sure that "rare" is particularly meaningful for non-leaf
parts.

> Daniel, is this the problem that you're getting at with "opacity"?

the origin of the term "opaque" used in this context can be found in the
definition of multipart/signed:

 https://tools.ietf.org/html/rfc1847#section-2.1

> That if you ask for a multipart, you should effectively get a slice
> out of the original message bytes (since multipart/* parts can't have
> non-identity transfer encodings).  Are you also saying that should
> extend to transfer encoded leaf parts, too?

hmm.  is it true that multipart/* parts can't have non-identity transfer
encodings?  that would simplify some things, but i don't have a
reference handy that says it's the case.

At any rate, i'm not sure it affects the need for being able to emit
both "rare" and "source" forms of at least the leaf (non-multipart) parts.

i hope this is all at least somewhat clarifying and not just adding to
the confusion,

--dkg

-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 1030 bytes
Desc: OpenPGP digital signature
URL: 



interaction between --format=raw and multipart handling [was: Re: Do not attept to output part raw if part is not GMimePart.]

2011-06-27 Thread Daniel Kahn Gillmor
On 06/27/2011 06:07 PM, Austin Clements wrote:
 Oh, right, of course.  show_message_part will walk into the parts, so
 format_part_content_raw will still be called on the leafs of a
 requested multipart.  Though, this approach results in each leaf being
 transfer decoded and printed individually, so if you ask for a
 multipart, you won't get the raw contents of the multipart (unless
 it's part 0), so much as you get the concatenated raw contents of
 each part in the multipart.


let's take two labeled examples:

A└┬╴multipart/signed 58292 bytes
B ├┬╴multipart/mixed 56553 bytes
C │├╴text/plain 1278 bytes
D │├╴text/plain attachment [grub-install.out] 54109 bytes
E │└╴text/x-diff attachment [597538.patch] 496 bytes
F └╴application/pgp-signature attachment [signature.asc] 900 bytes


X└┬╴multipart/signed 3863 bytes
Y ├╴text/plain 1857 bytes
Z └╴application/pgp-signature attachment [signature.asc] 900 bytes

(i know, you won't use A or Z as part IDs once we have hierarchical
part numbers, but consider them placeholders).

if parts F or Z are ever going to be useful (e.g. to some external
process that wants to validate the signature by hand), then the tool
needs to provide some way of producing parts B and Y in a pristine form
(that is, including MIME headers and without interpreting/applying any
transfer encodings).

Perhaps this means there are two flavors of raw that we should be
distinguishing, something like:

 0) source -- the equivalent to viewing the source of the message,
with headers and without attempting to reverse transfer-encodings, etc.

 1) rare -- (not entirely raw, but still bloody, ha ha) strip headers,
reverse transfer encodings, etc.

I think our current implementation of --format=raw emits source when
applied to the entire message, but rare when applied to one of the parts.

I'm suggesting that it might be useful to be able to get source of a
part.  (and perhaps it might also be useful to get the whole message
rare sometimes?)

My first instinct was: if it's multipart, provide source, if it's
single-part, provide rare.  But that fails for the XYZ case above --
we'd need Y (which is single-part) to be provided as source if we were
ever to be able to make use of Z on its own, so i don't think it'll be
that simple.

OTOH, i'm not sure that rare is particularly meaningful for non-leaf
parts.

 Daniel, is this the problem that you're getting at with opacity?

the origin of the term opaque used in this context can be found in the
definition of multipart/signed:

 https://tools.ietf.org/html/rfc1847#section-2.1

 That if you ask for a multipart, you should effectively get a slice
 out of the original message bytes (since multipart/* parts can't have
 non-identity transfer encodings).  Are you also saying that should
 extend to transfer encoded leaf parts, too?

hmm.  is it true that multipart/* parts can't have non-identity transfer
encodings?  that would simplify some things, but i don't have a
reference handy that says it's the case.

At any rate, i'm not sure it affects the need for being able to emit
both rare and source forms of at least the leaf (non-multipart) parts.

i hope this is all at least somewhat clarifying and not just adding to
the confusion,

--dkg



signature.asc
Description: OpenPGP digital signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: interaction between --format=raw and multipart handling [was: Re: Do not attept to output part raw if part is not GMimePart.]

2011-06-27 Thread Austin Clements
On Mon, Jun 27, 2011 at 6:41 PM, Daniel Kahn Gillmor
d...@fifthhorseman.net wrote:
 On 06/27/2011 06:07 PM, Austin Clements wrote:
 Oh, right, of course.  show_message_part will walk into the parts, so
 format_part_content_raw will still be called on the leafs of a
 requested multipart.  Though, this approach results in each leaf being
 transfer decoded and printed individually, so if you ask for a
 multipart, you won't get the raw contents of the multipart (unless
 it's part 0), so much as you get the concatenated raw contents of
 each part in the multipart.


 let's take two labeled examples:

 A└┬╴multipart/signed 58292 bytes
 B ├┬╴multipart/mixed 56553 bytes
 C │├╴text/plain 1278 bytes
 D │├╴text/plain attachment [grub-install.out] 54109 bytes
 E │└╴text/x-diff attachment [597538.patch] 496 bytes
 F └╴application/pgp-signature attachment [signature.asc] 900 bytes


 X└┬╴multipart/signed 3863 bytes
 Y ├╴text/plain 1857 bytes
 Z └╴application/pgp-signature attachment [signature.asc] 900 bytes

 (i know, you won't use A or Z as part IDs once we have hierarchical
 part numbers, but consider them placeholders).

 if parts F or Z are ever going to be useful (e.g. to some external
 process that wants to validate the signature by hand), then the tool
 needs to provide some way of producing parts B and Y in a pristine form
 (that is, including MIME headers and without interpreting/applying any
 transfer encodings).

 Perhaps this means there are two flavors of raw that we should be
 distinguishing, something like:

  0) source -- the equivalent to viewing the source of the message,
 with headers and without attempting to reverse transfer-encodings, etc.

  1) rare -- (not entirely raw, but still bloody, ha ha) strip headers,
 reverse transfer encodings, etc.

 I think our current implementation of --format=raw emits source when
 applied to the entire message, but rare when applied to one of the parts.

Yes.

 I'm suggesting that it might be useful to be able to get source of a
 part.  (and perhaps it might also be useful to get the whole message
 rare sometimes?)

 My first instinct was: if it's multipart, provide source, if it's
 single-part, provide rare.  But that fails for the XYZ case above --
 we'd need Y (which is single-part) to be provided as source if we were
 ever to be able to make use of Z on its own, so i don't think it'll be
 that simple.

 OTOH, i'm not sure that rare is particularly meaningful for non-leaf
 parts.

 That if you ask for a multipart, you should effectively get a slice
 out of the original message bytes (since multipart/* parts can't have
 non-identity transfer encodings).  Are you also saying that should
 extend to transfer encoded leaf parts, too?

 hmm.  is it true that multipart/* parts can't have non-identity transfer
 encodings?  that would simplify some things, but i don't have a
 reference handy that says it's the case.

RFC 2045, section 6.4: If an entity is of type multipart the
Content-Transfer-Encoding is not permitted to have any value other
than 7bit, 8bit or binary.  (And, for completeness, section
6.2: The Content-Transfer-Encoding values 7bit, 8bit, and
binary all mean that the identity (i.e. NO) encoding transformation
has been  performed.)

 At any rate, i'm not sure it affects the need for being able to emit
 both rare and source forms of at least the leaf (non-multipart) parts.

 i hope this is all at least somewhat clarifying and not just adding to
 the confusion,

Thanks.  That's actually very informative and solidifies some of
what's been slowly coagulating in my mind.

I was also thinking about the two output variants you describe
(though, being less clever, I was thinking raw and decoded).  The
fact that multipart/* parts can only have identity encodings makes me
wonder if the two could be merged by thinking of the decoded content
of a leaf part as a child/body to the original, encoded part.  On the
other hand, that doesn't make sense for other formats, so perhaps
that's not a fruitful approach.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch