Re: [Evolution-hackers] There's no need to be so hard on iconv
On Thu, 2007-10-11 at 12:14 +0200, Philip Van Hoof wrote: > I also have this one, for example > > Subject: =?ISO-2022-JP?B?GyRCJSIlaiUsLUUtRRsoQg==?= ( > =?ISO-2022-JP?B?GyRCITchJiZYISYbKEI=?=)o > =?ISO-2022-JP?B?GyRCIlwbKEI=?= > =?ISO-2022-JP?B?GyRCIiglUSE8JXMbKEI=?= !!" > =?ISO-2022-JP?B?GyRCISMhJhsoQg==?= :*: =?ISO-2022-JP?B?GyRCISYbKEI=?= > =?ISO-2022-JP?B?GyRCISwheRsoQg==?= > > I know that the characters " (", ")o ", " !!" and " :*: " should not be > there (at least, I think they shouldn't), I guess this is done by spam > bots to confuse spamassassin (not sure, though). > > This check in the code (line 842 in camel-mime-utils.c) makes any such > string become the base64 one. This is of course really not readable for > normal human beings (although, it depends on what you call a normal > one). > > 842: .. /* quick check to see if this could possibly be a real encoded word */ > 843: .. if (len < 8 || !(in[0] == '=' && in[1] == '?' && in[len-1] == '=' && > in[len-2] == '?')) { > 844: .. d(printf("invalid\n")); > 845: .. return NULL; > 846: .. } > > When just trying to decode the string, ignoring the check, it does work > quite well. At least for this case. > > I'm attaching yet another patch that ignores this check. The entire check should not be ignored. The len<8 check, for example, is important. So don't commit this patch :-) > On Thu, 2007-10-11 at 11:52 +0200, Philip Van Hoof wrote: > > In case iconv does not succeed in decoding for example the Subject > > header, it returns the base64 encoded one. That is is obviously not > > readable at all. The decoded one after base64 decoding (which did > > succeed in my test case) or whatever iconv could recover from it, sounds > > like a better option. > > > > This changeset (patch) on camel-mime-utils.c deals with the error > > situation (in case iconv did not return -1) by returning what(ever) > > iconv could recover from the string: > > > > http://tinymail.org/trac/tinymail/changeset/2830#file2 > > > > I attached: > > > > svn diff libtinymail-camel/camel-lite/camel/camel-mime-utils.c -r 2829 > > > /home/pvanhoof/diff.diff > > > > This is the Subject line of my test target: > > > > Subject: =?ISO-2022-JP?B?GyRCM048QiRLOkdEY0Z8NWsbKEI=?= > > > > =?ISO-2022-JP?B?GyRCIzJLfDFfIUFGfEonJCQkRyQqRU8kN0NXJDckXiQ5ISMlYSE8GyhC?= > > =?ISO-2022-JP?B?GyRCJWslXiUsJTglcxsoQg==?= > > > > > > I also opened a bug for this one: > > http://bugzilla.gnome.org/show_bug.cgi?id=485677 > > > > > > ___ > > Evolution-hackers mailing list > > Evolution-hackers@gnome.org > > http://mail.gnome.org/mailman/listinfo/evolution-hackers > ___ > Evolution-hackers mailing list > Evolution-hackers@gnome.org > http://mail.gnome.org/mailman/listinfo/evolution-hackers -- Philip Van Hoof, software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://www.pvanhoof.be/blog ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] There's no need to be so hard on iconv
On Thu, 2007-10-11 at 19:49 +0200, Philip Van Hoof wrote: > On Thu, 2007-10-11 at 13:10 -0400, Jeffrey Stedfast wrote: > > On Thu, 2007-10-11 at 17:14 +0200, Philip Van Hoof wrote: > > > On Thu, 2007-10-11 at 10:52 -0400, Jeffrey Stedfast wrote: > > > > I have far better fixes in GMime that need to be ported to Camel. > > > > > > > > > > Porting GMime to Camel would be an interesting effort indeed. Perhaps > > > just replacing CamelMimePart with GMime. > > > > Well... I hadn't exactly meant that it would be a good idea to > > necessarily replace Camel's MIME parser/objects with GMime. I had simply > > meant for Camel to lift GMime's rfc2047 decoder logic which does more > > with charset fallback than Camel currently does... plus it also is a bit > > more liberal in decoding malformed rfc2047 encoded-word tokens (well, > > assuming ENABLE_RFC2047_WORKAROUNDS is defined...). > > Aha, so we can just look at what we find in the #ifdef > ENABLE_RFC2047_WORKAROUNDS #endif blocks and learn for your effort :) well, you probably want all the code from those functions, even w/o the #ifdef (that #ifdef really only works around the problem where encoded-word tokens which are in the middle of other word tokens) in order to get the charset fixes as well. > > Cool, thanks. (note. isn't spruce GPL and Camel LGPL?) > Well, GMime is separate from Spruce... but yea, I just realised GMime is GPLv2+ - however, obviously I have an interest in improving Evolution and so I give permission to Evo to use that code ;) I need to poke the people who've contributed bits of code to GMime (the code I mentioned in this thread was all implemented by me, so it's safe to take) and make sure it's ok with them, but I'm likely going to relicense GMime to be LGPLv2+ Jeff ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] There's no need to be so hard on iconv
On Thu, 2007-10-11 at 13:10 -0400, Jeffrey Stedfast wrote: > On Thu, 2007-10-11 at 17:14 +0200, Philip Van Hoof wrote: > > On Thu, 2007-10-11 at 10:52 -0400, Jeffrey Stedfast wrote: > > > I have far better fixes in GMime that need to be ported to Camel. > > > > > > > Porting GMime to Camel would be an interesting effort indeed. Perhaps > > just replacing CamelMimePart with GMime. > > Well... I hadn't exactly meant that it would be a good idea to > necessarily replace Camel's MIME parser/objects with GMime. I had simply > meant for Camel to lift GMime's rfc2047 decoder logic which does more > with charset fallback than Camel currently does... plus it also is a bit > more liberal in decoding malformed rfc2047 encoded-word tokens (well, > assuming ENABLE_RFC2047_WORKAROUNDS is defined...). Aha, so we can just look at what we find in the #ifdef ENABLE_RFC2047_WORKAROUNDS #endif blocks and learn for your effort :) Cool, thanks. (note. isn't spruce GPL and Camel LGPL?) -- Philip Van Hoof, software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://www.pvanhoof.be/blog ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] There's no need to be so hard on iconv
On Thu, 2007-10-11 at 17:14 +0200, Philip Van Hoof wrote: > On Thu, 2007-10-11 at 10:52 -0400, Jeffrey Stedfast wrote: > > I have far better fixes in GMime that need to be ported to Camel. > > > > Porting GMime to Camel would be an interesting effort indeed. Perhaps > just replacing CamelMimePart with GMime. Well... I hadn't exactly meant that it would be a good idea to necessarily replace Camel's MIME parser/objects with GMime. I had simply meant for Camel to lift GMime's rfc2047 decoder logic which does more with charset fallback than Camel currently does... plus it also is a bit more liberal in decoding malformed rfc2047 encoded-word tokens (well, assuming ENABLE_RFC2047_WORKAROUNDS is defined...). Jeff ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] There's no need to be so hard on iconv
On Thu, 2007-10-11 at 10:52 -0400, Jeffrey Stedfast wrote: > I have far better fixes in GMime that need to be ported to Camel. > Porting GMime to Camel would be an interesting effort indeed. Perhaps just replacing CamelMimePart with GMime. I noticed that CamelMimePart it not 'very' glued to the providers of Camel. Mostly parsing either TOP or something that you receive from the IMAP server that looks like a TOP into a CamelMessageInfo and through the CamelDataCache (or the one that is specific for IMAP) getting a stream that will be passed to CamelMimePart's infrastructure. That's just two glue points to cut and replace with new ones. However, my actual goal is to marry the mime parser a lot more with the service: o. Modern IMAP servers will have the CATENATE capability for composing a message at the IMAP server (Lemonade's forward without download feature). Although this is something very few actual IMAP servers in the field have correctly configured right now, and therefore it does not have a very high priority for me). o. Retrieving the parts the first time they are streamed to a target (like a file-stream or the HTML component), while a decoding stream sits in the middle or not, rather than always when a message is requested, would be very interesting for networks where consuming bandwidth is expensive and devices where local storage is limited. Right now I call this 'partial message retrieval'. What I would really want would be retrieving any part of the message on-demand and initially only storing the BODYSTRUCTURE and the summary item locally (and as parts get requested, caching them individually). This is, afaik, a major rewrite and rethinking of CamelMimePart vs. CamelStore and CamelFolder and why I think even spruce and the new IMAP4 provider are not yet what I think is what we'll in future need. Although I'm also fine with pragmatism and pragmatic goals. I also understand that right now only a limited amount of IMAP servers would cope with this method and that for a lot of servers an old-style of using the service would be needed. o. The CONVERT capability (that is not even specified yet) which will make it possible for an E-mail client to ask the IMAP server for a converted version of a MIME part. For example if the image is too large for the bandwidth and display of the device, the client could ask for a converted version of it. Another example would be converting a Word document to a antiword like text version of the Word document, serverside, in stead of having to deploy a bunch of Word-format reading code on your cellphone and retrieve the entire document. These three things mean that what I would really like, might not be compatible with what GMime's (current) purpose is (I think GMime is more about parsing it than also having control over the retrieval of the parts, dealing with BODYSTRUCTURE rather than parsing from the actual content, etc etc). GMime, right now, is 'very' interesting if you already have the entire contents of the E-mail (like, serverside). Although with some adaptations you can make these three work with what CamelMimePart and/or GMime are too (no need to convince me of that). It might, however, not be as ideal or harder this way. Anyway, just thoughts... -- Philip Van Hoof, software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://www.pvanhoof.be/blog ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] There's no need to be so hard on iconv
I have far better fixes in GMime that need to be ported to Camel. Jeff On Thu, 2007-10-11 at 12:14 +0200, Philip Van Hoof wrote: > I also have this one, for example > > Subject: =?ISO-2022-JP?B?GyRCJSIlaiUsLUUtRRsoQg==?= ( > =?ISO-2022-JP?B?GyRCITchJiZYISYbKEI=?=)o > =?ISO-2022-JP?B?GyRCIlwbKEI=?= > =?ISO-2022-JP?B?GyRCIiglUSE8JXMbKEI=?= !!" > =?ISO-2022-JP?B?GyRCISMhJhsoQg==?= :*: =?ISO-2022-JP?B?GyRCISYbKEI=?= > =?ISO-2022-JP?B?GyRCISwheRsoQg==?= > > I know that the characters " (", ")o ", " !!" and " :*: " should not be > there (at least, I think they shouldn't), I guess this is done by spam > bots to confuse spamassassin (not sure, though). > > This check in the code (line 842 in camel-mime-utils.c) makes any such > string become the base64 one. This is of course really not readable for > normal human beings (although, it depends on what you call a normal > one). > > 842: .. /* quick check to see if this could possibly be a real encoded word */ > 843: .. if (len < 8 || !(in[0] == '=' && in[1] == '?' && in[len-1] == '=' && > in[len-2] == '?')) { > 844: .. d(printf("invalid\n")); > 845: .. return NULL; > 846: .. } > > When just trying to decode the string, ignoring the check, it does work > quite well. At least for this case. > > I'm attaching yet another patch that ignores this check. > > > On Thu, 2007-10-11 at 11:52 +0200, Philip Van Hoof wrote: > > In case iconv does not succeed in decoding for example the Subject > > header, it returns the base64 encoded one. That is is obviously not > > readable at all. The decoded one after base64 decoding (which did > > succeed in my test case) or whatever iconv could recover from it, sounds > > like a better option. > > > > This changeset (patch) on camel-mime-utils.c deals with the error > > situation (in case iconv did not return -1) by returning what(ever) > > iconv could recover from the string: > > > > http://tinymail.org/trac/tinymail/changeset/2830#file2 > > > > I attached: > > > > svn diff libtinymail-camel/camel-lite/camel/camel-mime-utils.c -r 2829 > > > /home/pvanhoof/diff.diff > > > > This is the Subject line of my test target: > > > > Subject: =?ISO-2022-JP?B?GyRCM048QiRLOkdEY0Z8NWsbKEI=?= > > > > =?ISO-2022-JP?B?GyRCIzJLfDFfIUFGfEonJCQkRyQqRU8kN0NXJDckXiQ5ISMlYSE8GyhC?= > > =?ISO-2022-JP?B?GyRCJWslXiUsJTglcxsoQg==?= > > > > > > I also opened a bug for this one: > > http://bugzilla.gnome.org/show_bug.cgi?id=485677 > > > > > > ___ > > Evolution-hackers mailing list > > Evolution-hackers@gnome.org > > http://mail.gnome.org/mailman/listinfo/evolution-hackers > ___ > Evolution-hackers mailing list > Evolution-hackers@gnome.org > http://mail.gnome.org/mailman/listinfo/evolution-hackers ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] There's no need to be so hard on iconv
I also have this one, for example Subject: =?ISO-2022-JP?B?GyRCJSIlaiUsLUUtRRsoQg==?= ( =?ISO-2022-JP?B?GyRCITchJiZYISYbKEI=?=)o =?ISO-2022-JP?B?GyRCIlwbKEI=?= =?ISO-2022-JP?B?GyRCIiglUSE8JXMbKEI=?= !!" =?ISO-2022-JP?B?GyRCISMhJhsoQg==?= :*: =?ISO-2022-JP?B?GyRCISYbKEI=?= =?ISO-2022-JP?B?GyRCISwheRsoQg==?= I know that the characters " (", ")o ", " !!" and " :*: " should not be there (at least, I think they shouldn't), I guess this is done by spam bots to confuse spamassassin (not sure, though). This check in the code (line 842 in camel-mime-utils.c) makes any such string become the base64 one. This is of course really not readable for normal human beings (although, it depends on what you call a normal one). 842: .. /* quick check to see if this could possibly be a real encoded word */ 843: .. if (len < 8 || !(in[0] == '=' && in[1] == '?' && in[len-1] == '=' && in[len-2] == '?')) { 844: .. d(printf("invalid\n")); 845: .. return NULL; 846: .. } When just trying to decode the string, ignoring the check, it does work quite well. At least for this case. I'm attaching yet another patch that ignores this check. On Thu, 2007-10-11 at 11:52 +0200, Philip Van Hoof wrote: > In case iconv does not succeed in decoding for example the Subject > header, it returns the base64 encoded one. That is is obviously not > readable at all. The decoded one after base64 decoding (which did > succeed in my test case) or whatever iconv could recover from it, sounds > like a better option. > > This changeset (patch) on camel-mime-utils.c deals with the error > situation (in case iconv did not return -1) by returning what(ever) > iconv could recover from the string: > > http://tinymail.org/trac/tinymail/changeset/2830#file2 > > I attached: > > svn diff libtinymail-camel/camel-lite/camel/camel-mime-utils.c -r 2829 > > /home/pvanhoof/diff.diff > > This is the Subject line of my test target: > > Subject: =?ISO-2022-JP?B?GyRCM048QiRLOkdEY0Z8NWsbKEI=?= > > =?ISO-2022-JP?B?GyRCIzJLfDFfIUFGfEonJCQkRyQqRU8kN0NXJDckXiQ5ISMlYSE8GyhC?= > =?ISO-2022-JP?B?GyRCJWslXiUsJTglcxsoQg==?= > > > I also opened a bug for this one: > http://bugzilla.gnome.org/show_bug.cgi?id=485677 > > > ___ > Evolution-hackers mailing list > Evolution-hackers@gnome.org > http://mail.gnome.org/mailman/listinfo/evolution-hackers -- Philip Van Hoof, software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://www.pvanhoof.be/blog Index: libtinymail-camel/camel-lite/camel/camel-mime-utils.c === --- libtinymail-camel/camel-lite/camel/camel-mime-utils.c (revision 2830) +++ libtinymail-camel/camel-lite/camel/camel-mime-utils.c (working copy) @@ -842,7 +842,7 @@ /* quick check to see if this could possibly be a real encoded word */ if (len < 8 || !(in[0] == '=' && in[1] == '?' && in[len-1] == '=' && in[len-2] == '?')) { d(printf("invalid\n")); - return NULL; + /* return NULL; */ } /* skip past the charset to the encoding type */ ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
[Evolution-hackers] There's no need to be so hard on iconv
In case iconv does not succeed in decoding for example the Subject header, it returns the base64 encoded one. That is is obviously not readable at all. The decoded one after base64 decoding (which did succeed in my test case) or whatever iconv could recover from it, sounds like a better option. This changeset (patch) on camel-mime-utils.c deals with the error situation (in case iconv did not return -1) by returning what(ever) iconv could recover from the string: http://tinymail.org/trac/tinymail/changeset/2830#file2 I attached: svn diff libtinymail-camel/camel-lite/camel/camel-mime-utils.c -r 2829 > /home/pvanhoof/diff.diff This is the Subject line of my test target: Subject: =?ISO-2022-JP?B?GyRCM048QiRLOkdEY0Z8NWsbKEI=?= =?ISO-2022-JP?B?GyRCIzJLfDFfIUFGfEonJCQkRyQqRU8kN0NXJDckXiQ5ISMlYSE8GyhC?= =?ISO-2022-JP?B?GyRCJWslXiUsJTglcxsoQg==?= I also opened a bug for this one: http://bugzilla.gnome.org/show_bug.cgi?id=485677 -- Philip Van Hoof, software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://www.pvanhoof.be/blog Index: libtinymail-camel/camel-lite/camel/camel-mime-utils.c === --- libtinymail-camel/camel-lite/camel/camel-mime-utils.c (revision 2829) +++ libtinymail-camel/camel-lite/camel/camel-mime-utils.c (working copy) @@ -904,7 +904,14 @@ e_iconv (ic, NULL, 0, &outbuf, &outlen); *outbuf = 0; decoded = g_strdup (outbase); +} else { + perror ("iconv"); + e_iconv (ic, NULL, 0, &outbuf, &outlen); + *outbuf = 0; + decoded = g_strdup (outbase); + /* decoded = g_strdup (inbuf); */ } + e_iconv_close (ic); } else { w(g_warning ("Cannot decode charset, header display may be corrupt: %s: %s", ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers