Re: [mime4j] newlines and parsing of nested (encoded) rfc822 messages

Stefano Bagnara Thu, 17 Jul 2008 08:48:29 -0700

Stefano Bagnara ha scritto:

Stefano Bagnara ha scritto:
I noticed that at a point in past the EOLConvertingInputStream hasbeen removed from the chain.
I think this create issues when we parse an input file having only \nand write it in output.
- It seems that we parse most of the code only checking for \n (whatdoes it happen when instead there are only \r? what should we do?)
- If the message have only newlines it seems mime4j ends up outputtingheaders with CRLF and body with LF.
- If the input message have CR ending lines they are not considered bymime4j.
IMHO either we accept LF, CR, and CRLF as CRLF or we only accept CRLF.
If we do that we have to take care of encoded nested messages: theycould have again LF, CR and CRLF like the top stream.
What is the right approach? Should we add a EOLConvertingInputStream(CONVERT_BOTH) to every level of parsing or should we fail to parsemessages with bad newlines?
I don't like the current behaviour where we accept some malformed data(LF alone are considered CRLF from our parser), we change some of them(the one between headers are converted to CRLF) and we still outputmalformed data.
Opinions?
I tried this patch and it seems to work fine (even if it breaks one ofour core tests that do not expect a CR in an header to be considered anewline):
Index: src/main/java/org/apache/james/mime4j/MimeEntity.java
===================================================================
--- src/main/java/org/apache/james/mime4j/MimeEntity.java (revision677582)
+++ src/main/java/org/apache/james/mime4j/MimeEntity.java    (working copy)
@@ -197,7 +197,7 @@
         InputStream instream;
         if (MimeUtil.isBase64Encoding(transferEncoding)) {
             log.debug("base64 encoded message/rfc822 detected");
-            instream = new Base64InputStream(dataStream);
+ instream = new EOLConvertingInputStream(newBase64InputStream(dataStream));
         } else if (MimeUtil.isQuotedPrintableEncoded(transferEncoding)) {
             log.debug("quoted-printable encoded message/rfc822 detected");
             instream = new QuotedPrintableInputStream(dataStream);
Index: src/main/java/org/apache/james/mime4j/MimeTokenStream.java
===================================================================
--- src/main/java/org/apache/james/mime4j/MimeTokenStream.java(revision 676846)+++ src/main/java/org/apache/james/mime4j/MimeTokenStream.java(working copy)
@@ -143,7 +143,7 @@

     private void doParse(InputStream stream, String contentType) {
         entities.clear();
-        rootInputStream = new RootInputStream(stream);
+ rootInputStream = new RootInputStream(newEOLConvertingInputStream(stream));inbuffer = new BufferedLineReaderInputStream(rootInputStream, 4* 1024);
         switch (recursionMode) {
         case M_RAW:


IIRC the EOLConvertingInputStream was removed because of performance issue.


Oleg reported this on a JIRA issue:
----

Indiscriminate conversion of line delimiters regardless of theirposition within the data stream is plain WRONG. I am still of an opinionEOLConvertingInputStream is utterly and helplessly broken, at least forMIME content transmitted over HTTP. The change you are proposing makesmime4j simply useless for HttpClient and FileUpload


http://marc.info/?l=james-dev&m=121528134811461&w=2
--------

I've had a fast read of the RFC2822 about this issue. It insists thatCRLF is the only valid delimiter for a canonical rfc822 message.Furthermore rfc2822 does not allow the use of isolated CR or LF.So, whenever isolated CR or isolated LF is found we have a malformedrfc822 message and we have to define how to deal with it.

I don't understand why a conversion is wrong for the http case (whendoes it happen that you have to deal with isolated LF ?).


So we have options:
1) fail parsing anything containing isolated CR or LF chars.
2) parse isolated CR or isolated LF as CRLF and in this case:
   a. make sure we output well formed rfc2822 message (CRLF only)
   b. keep bad newlines as is (more difficult to implement)
   c. randomly convert only some of them (as we do now)

With regard to #2 we also have to decide whether the choice also appliesto parsing of Base64 encoded nested rfc822 messages or not.


Stefano

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [mime4j] newlines and parsing of nested (encoded) rfc822 messages

Reply via email to