Re: [mime4j] newlines and parsing of nested (encoded) rfc822 messages

Stefano Bagnara Fri, 18 Jul 2008 03:38:30 -0700

Oleg Kalnichevski ha scritto:

On Fri, 2008-07-18 at 09:56 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
On Thu, 2008-07-17 at 20:21 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
Not only does this change completely reverts the performance gains andmakes the whole refactroring exercise completely pointless due to anutterly inefficient implementation of EOLConvertingInputStream, it isalso conceptually wrong (in my humble opinion), as it causes mime4j tocorrupt 8bit encoded 'application/octet-stream' content. This basicallyrenders mime4j incompatible with commons browsers and HttpClient
The performance of the EOLConvertingInputStream is not important at allif removing it we have an unusable library. So let's talk about what weexpect from the library, then we'll discuss how to make it performant. Ibelieve we have technical skills to make a performant EOLConverting stream.
About the 8bit encoded 'application/octet-stream' I think we just needto find the right RFC telling us what we have to do: the RFC I readabout MIME and its applications always tell that CR and LF must not bealone and that the appropriate transfer encoding have to be used inorder to avoid isolated LF and CR: it is not a matter of personalpreferences, it is a matter of rfc compliance. Let's find the docs, first.
What I can find as definition of "8bit" (RFC-2045 Section 2.8) is:
-------------------
"8bit data" refers to data that is all represented as relatively
short lines with 998 octets or less between CRLF line separation
sequences [RFC-821]), but octets with decimal values greater than 127
may be used.  As with "7bit data" CR and LF octets only occur as part
of CRLF line separation sequences and no NULs are allowed.
-------------------
Stefano,

You are very welcome to impose whatever strict interpretation of the
relevant RFCs are your hearts desires. Just please leave on option
allowing to override it so that the mime4j parser could be used to parse
real-world content.
Oleg, don't take me wrong. I simply want to make sure we all understandwhat RFC say and understand the specific cases we are ignoring it and WHY.
In the case of outer boundary we introduced backward compatibilityissues in the name of performance mainly because of lack of knowledge ofthe RFCs. I'm not an expert, too, but I think it is important to atleast take them into consideration once we find the right docs.
I'm not saying that we MUST be 100% compliant and strict, but I want tomake sure we know when we are doing something not compliant and that weagree that it is good.
One of the main goal is interoperability, so everytime we do somethingdifferent from what RFC tell us we have to make sure we are not breakinginteroperability with other RFC compliant tools.
I'm far from being a MIME expert, so I find it difficult to keep up withthis discussion if I have to convince people of something. I just wantto share my little knowledge about the (mainly SMTP related) RFCs.
Stefano
Stefano,

The core of this issue is not about standards compliance. I am fine with
mime4j being strict in its interpretation of relevant RFCs per default.
However, the idea of _indiscriminate_ conversion of line delimiters
regardless of their occurrence in the data stream seems _very_, _very_
__conceptually__ wrong to me.

I can't help feeling that Ayatollah style orthodoxy about line
delimiters handling just does not really help anyone. Fortunately for
JAMES, MTAs an MUAs are too complex to be written by complete muppets.
We do not have that privilege in the HTTP world where one has no other
choice but to interoperate with tons of HTTP agents and CGI scripts
written with a complete disregard of standards. So, in the
HttpComponents project we have a very simple policy: be lenient about
parsing, be strict about formatting. That seems to work well for _us_.

Oleg

"be lenient about parsing, be strict about formatting" is exactly whatthe JAMES PMC agreed in the guidelines.

Conversion should not be done at all but we want to be lenient so we dosome conversion to support some non compliant agent. I also agree thatconversion may not be appropriate in any case, and that's why wediscuss. It doesn't worth keep discussing this issue at this high leveland instead I would like to keep our focus on real solutions.

In fact, after reading a lot of RFC today, I think that what you getfrom HTTP is perfectly standard behaviour (I'm not sure if they miss the"Content-Transfer-Encoding: binary" header of if some RFC define it asthe default for HTTP) but I found rfc1867 telling that it is common touse the binary transfer encoding in multipart/form-data mime parts inHTTP, so the fact is probably that what you want is what the RFC ask usto implement, but we first understand things and then do things ;-)

And be sure that the same issues you find with HTTP client exists alsowith MUA and MTA. Muppets are all around and we care for RFC so muchexactly because we don't want other people to call us muppets ;-)

Please read other messages I posted in this thread today, because Ithink they are more concrete and propositive than this leaf of the thread.


Stefano

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [mime4j] newlines and parsing of nested (encoded) rfc822 messages

Reply via email to