Re: [Evolution-hackers] There's no need to be so hard on iconv

2007-10-12 Thread Philip Van Hoof
On Thu, 2007-10-11 at 12:14 +0200, Philip Van Hoof wrote:
> I also have this one, for example
> 
> Subject: =?ISO-2022-JP?B?GyRCJSIlaiUsLUUtRRsoQg==?= (
> =?ISO-2022-JP?B?GyRCITchJiZYISYbKEI=?=)o 
> =?ISO-2022-JP?B?GyRCIlwbKEI=?=
> =?ISO-2022-JP?B?GyRCIiglUSE8JXMbKEI=?= !!"
> =?ISO-2022-JP?B?GyRCISMhJhsoQg==?= :*: =?ISO-2022-JP?B?GyRCISYbKEI=?=
> =?ISO-2022-JP?B?GyRCISwheRsoQg==?=
> 
> I know that the characters " (", ")o ", " !!" and " :*: " should not be
> there (at least, I think they shouldn't), I guess this is done by spam
> bots to confuse spamassassin (not sure, though).
> 
> This check in the code (line 842 in camel-mime-utils.c) makes any such
> string become the base64 one. This is of course really not readable for
> normal human beings (although, it depends on what you call a normal
> one).
> 
> 842: .. /* quick check to see if this could possibly be a real encoded word */
> 843: ..  if (len < 8 || !(in[0] == '=' && in[1] == '?' && in[len-1] == '=' && 
> in[len-2] == '?')) {
> 844: ..   d(printf("invalid\n"));
> 845: ..   return NULL;
> 846: ..  }
> 
> When just trying to decode the string, ignoring the check, it does work
> quite well. At least for this case.
> 
> I'm attaching yet another patch that ignores this check.


The entire check should not be ignored. The len<8 check, for example, is
important. So don't commit this patch :-)



> On Thu, 2007-10-11 at 11:52 +0200, Philip Van Hoof wrote:
> > In case iconv does not succeed in decoding for example the Subject
> > header, it returns the base64 encoded one. That is is obviously not
> > readable at all. The decoded one after base64 decoding (which did
> > succeed in my test case) or whatever iconv could recover from it, sounds
> > like a better option.
> > 
> > This changeset (patch) on camel-mime-utils.c deals with the error
> > situation (in case iconv did not return -1) by returning what(ever)
> > iconv could recover from the string:
> > 
> > http://tinymail.org/trac/tinymail/changeset/2830#file2
> > 
> > I attached:
> > 
> > svn diff libtinymail-camel/camel-lite/camel/camel-mime-utils.c -r 2829 > 
> > /home/pvanhoof/diff.diff
> > 
> > This is the Subject line of my test target:
> > 
> > Subject: =?ISO-2022-JP?B?GyRCM048QiRLOkdEY0Z8NWsbKEI=?=
> > 
> > =?ISO-2022-JP?B?GyRCIzJLfDFfIUFGfEonJCQkRyQqRU8kN0NXJDckXiQ5ISMlYSE8GyhC?=
> > =?ISO-2022-JP?B?GyRCJWslXiUsJTglcxsoQg==?=
> > 
> > 
> > I also opened a bug for this one:
> > http://bugzilla.gnome.org/show_bug.cgi?id=485677
> > 
> > 
> > ___
> > Evolution-hackers mailing list
> > Evolution-hackers@gnome.org
> > http://mail.gnome.org/mailman/listinfo/evolution-hackers
> ___
> Evolution-hackers mailing list
> Evolution-hackers@gnome.org
> http://mail.gnome.org/mailman/listinfo/evolution-hackers
-- 
Philip Van Hoof, software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://www.pvanhoof.be/blog




___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] There's no need to be so hard on iconv

2007-10-11 Thread Jeffrey Stedfast
On Thu, 2007-10-11 at 19:49 +0200, Philip Van Hoof wrote:
> On Thu, 2007-10-11 at 13:10 -0400, Jeffrey Stedfast wrote:
> > On Thu, 2007-10-11 at 17:14 +0200, Philip Van Hoof wrote:
> > > On Thu, 2007-10-11 at 10:52 -0400, Jeffrey Stedfast wrote:
> > > > I have far better fixes in GMime that need to be ported to Camel.
> > > > 
> > > 
> > > Porting GMime to Camel would be an interesting effort indeed. Perhaps
> > > just replacing CamelMimePart with GMime.
> > 
> > Well... I hadn't exactly meant that it would be a good idea to
> > necessarily replace Camel's MIME parser/objects with GMime. I had simply
> > meant for Camel to lift GMime's rfc2047 decoder logic which does more
> > with charset fallback than Camel currently does... plus it also is a bit
> > more liberal in decoding malformed rfc2047 encoded-word tokens (well,
> > assuming ENABLE_RFC2047_WORKAROUNDS is defined...).
> 
> Aha, so we can just look at what we find in the  #ifdef
> ENABLE_RFC2047_WORKAROUNDS #endif blocks and learn for your effort :)

well, you probably want all the code from those functions, even w/o the
#ifdef (that #ifdef really only works around the problem where
encoded-word tokens which are in the middle of other word tokens) in
order to get the charset fixes as well.

> 
> Cool, thanks. (note. isn't spruce GPL and Camel LGPL?)
> 

Well, GMime is separate from Spruce... but yea, I just realised GMime is
GPLv2+ - however, obviously I have an interest in improving Evolution
and so I give permission to Evo to use that code ;)

I need to poke the people who've contributed bits of code to GMime (the
code I mentioned in this thread was all implemented by me, so it's safe
to take) and make sure it's ok with them, but I'm likely going to
relicense GMime to be LGPLv2+

Jeff


___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] There's no need to be so hard on iconv

2007-10-11 Thread Philip Van Hoof
On Thu, 2007-10-11 at 13:10 -0400, Jeffrey Stedfast wrote:
> On Thu, 2007-10-11 at 17:14 +0200, Philip Van Hoof wrote:
> > On Thu, 2007-10-11 at 10:52 -0400, Jeffrey Stedfast wrote:
> > > I have far better fixes in GMime that need to be ported to Camel.
> > > 
> > 
> > Porting GMime to Camel would be an interesting effort indeed. Perhaps
> > just replacing CamelMimePart with GMime.
> 
> Well... I hadn't exactly meant that it would be a good idea to
> necessarily replace Camel's MIME parser/objects with GMime. I had simply
> meant for Camel to lift GMime's rfc2047 decoder logic which does more
> with charset fallback than Camel currently does... plus it also is a bit
> more liberal in decoding malformed rfc2047 encoded-word tokens (well,
> assuming ENABLE_RFC2047_WORKAROUNDS is defined...).

Aha, so we can just look at what we find in the  #ifdef
ENABLE_RFC2047_WORKAROUNDS #endif blocks and learn for your effort :)

Cool, thanks. (note. isn't spruce GPL and Camel LGPL?)

-- 
Philip Van Hoof, software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://www.pvanhoof.be/blog




___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] There's no need to be so hard on iconv

2007-10-11 Thread Jeffrey Stedfast
On Thu, 2007-10-11 at 17:14 +0200, Philip Van Hoof wrote:
> On Thu, 2007-10-11 at 10:52 -0400, Jeffrey Stedfast wrote:
> > I have far better fixes in GMime that need to be ported to Camel.
> > 
> 
> Porting GMime to Camel would be an interesting effort indeed. Perhaps
> just replacing CamelMimePart with GMime.

Well... I hadn't exactly meant that it would be a good idea to
necessarily replace Camel's MIME parser/objects with GMime. I had simply
meant for Camel to lift GMime's rfc2047 decoder logic which does more
with charset fallback than Camel currently does... plus it also is a bit
more liberal in decoding malformed rfc2047 encoded-word tokens (well,
assuming ENABLE_RFC2047_WORKAROUNDS is defined...).

Jeff


___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] There's no need to be so hard on iconv

2007-10-11 Thread Philip Van Hoof
On Thu, 2007-10-11 at 10:52 -0400, Jeffrey Stedfast wrote:
> I have far better fixes in GMime that need to be ported to Camel.
> 

Porting GMime to Camel would be an interesting effort indeed. Perhaps
just replacing CamelMimePart with GMime.

I noticed that CamelMimePart it not 'very' glued to the providers of
Camel. Mostly parsing either TOP or something that you receive from the
IMAP server that looks like a TOP into a CamelMessageInfo and through
the CamelDataCache (or the one that is specific for IMAP) getting a
stream that will be passed to CamelMimePart's infrastructure.

That's just two glue points to cut and replace with new ones.

However, my actual goal is to marry the mime parser a lot more with the
service:

o. Modern IMAP servers will have the CATENATE capability for composing a
   message at the IMAP server (Lemonade's forward without download
   feature). Although this is something very few actual IMAP servers in
   the field have correctly configured right now, and therefore it does
   not have a very high priority for me).

o. Retrieving the parts the first time they are streamed to a target
   (like a file-stream or the HTML component), while a decoding stream
   sits in the middle or not, rather than always when a message is
   requested, would be very interesting for networks where consuming
   bandwidth is expensive and devices where local storage is limited.

   Right now I call this 'partial message retrieval'. What I would
   really want would be retrieving any part of the message on-demand and
   initially only storing the BODYSTRUCTURE and the summary item
   locally (and as parts get requested, caching them individually).

   This is, afaik, a major rewrite and rethinking of CamelMimePart vs.
   CamelStore and CamelFolder and why I think even spruce and the new
   IMAP4 provider are not yet what I think is what we'll in future
   need. Although I'm also fine with pragmatism and pragmatic goals. I
   also understand that right now only a limited amount of IMAP servers
   would cope with this method and that for a lot of servers an
   old-style of using the service would be needed.

o. The CONVERT capability (that is not even specified yet) which will
   make it possible for an E-mail client to ask the IMAP server for a
   converted version of a MIME part. For example if the image is too
   large for the bandwidth and display of the device, the client could
   ask for a converted version of it. 

   Another example would be converting a Word document to a antiword
   like text version of the Word document, serverside, in stead of
   having to deploy a bunch of Word-format reading code on your
   cellphone and retrieve the entire document.

These three things mean that what I would really like, might not be
compatible with what GMime's (current) purpose is (I think GMime is more
about parsing it than also having control over the retrieval of the
parts, dealing with BODYSTRUCTURE rather than parsing from the actual
content, etc etc). GMime, right now, is 'very' interesting if you
already have the entire contents of the E-mail (like, serverside).

Although with some adaptations you can make these three work with what
CamelMimePart and/or GMime are too (no need to convince me of that). It
might, however, not be as ideal or harder this way.

Anyway, just thoughts...


-- 
Philip Van Hoof, software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://www.pvanhoof.be/blog




___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] There's no need to be so hard on iconv

2007-10-11 Thread Jeffrey Stedfast
I have far better fixes in GMime that need to be ported to Camel.

Jeff

On Thu, 2007-10-11 at 12:14 +0200, Philip Van Hoof wrote:
> I also have this one, for example
> 
> Subject: =?ISO-2022-JP?B?GyRCJSIlaiUsLUUtRRsoQg==?= (
> =?ISO-2022-JP?B?GyRCITchJiZYISYbKEI=?=)o 
> =?ISO-2022-JP?B?GyRCIlwbKEI=?=
> =?ISO-2022-JP?B?GyRCIiglUSE8JXMbKEI=?= !!"
> =?ISO-2022-JP?B?GyRCISMhJhsoQg==?= :*: =?ISO-2022-JP?B?GyRCISYbKEI=?=
> =?ISO-2022-JP?B?GyRCISwheRsoQg==?=
> 
> I know that the characters " (", ")o ", " !!" and " :*: " should not be
> there (at least, I think they shouldn't), I guess this is done by spam
> bots to confuse spamassassin (not sure, though).
> 
> This check in the code (line 842 in camel-mime-utils.c) makes any such
> string become the base64 one. This is of course really not readable for
> normal human beings (although, it depends on what you call a normal
> one).
> 
> 842: .. /* quick check to see if this could possibly be a real encoded word */
> 843: ..  if (len < 8 || !(in[0] == '=' && in[1] == '?' && in[len-1] == '=' && 
> in[len-2] == '?')) {
> 844: ..   d(printf("invalid\n"));
> 845: ..   return NULL;
> 846: ..  }
> 
> When just trying to decode the string, ignoring the check, it does work
> quite well. At least for this case.
> 
> I'm attaching yet another patch that ignores this check.
> 
> 
> On Thu, 2007-10-11 at 11:52 +0200, Philip Van Hoof wrote:
> > In case iconv does not succeed in decoding for example the Subject
> > header, it returns the base64 encoded one. That is is obviously not
> > readable at all. The decoded one after base64 decoding (which did
> > succeed in my test case) or whatever iconv could recover from it, sounds
> > like a better option.
> > 
> > This changeset (patch) on camel-mime-utils.c deals with the error
> > situation (in case iconv did not return -1) by returning what(ever)
> > iconv could recover from the string:
> > 
> > http://tinymail.org/trac/tinymail/changeset/2830#file2
> > 
> > I attached:
> > 
> > svn diff libtinymail-camel/camel-lite/camel/camel-mime-utils.c -r 2829 > 
> > /home/pvanhoof/diff.diff
> > 
> > This is the Subject line of my test target:
> > 
> > Subject: =?ISO-2022-JP?B?GyRCM048QiRLOkdEY0Z8NWsbKEI=?=
> > 
> > =?ISO-2022-JP?B?GyRCIzJLfDFfIUFGfEonJCQkRyQqRU8kN0NXJDckXiQ5ISMlYSE8GyhC?=
> > =?ISO-2022-JP?B?GyRCJWslXiUsJTglcxsoQg==?=
> > 
> > 
> > I also opened a bug for this one:
> > http://bugzilla.gnome.org/show_bug.cgi?id=485677
> > 
> > 
> > ___
> > Evolution-hackers mailing list
> > Evolution-hackers@gnome.org
> > http://mail.gnome.org/mailman/listinfo/evolution-hackers
> ___
> Evolution-hackers mailing list
> Evolution-hackers@gnome.org
> http://mail.gnome.org/mailman/listinfo/evolution-hackers

___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] There's no need to be so hard on iconv

2007-10-11 Thread Philip Van Hoof

I also have this one, for example

Subject: =?ISO-2022-JP?B?GyRCJSIlaiUsLUUtRRsoQg==?= (
=?ISO-2022-JP?B?GyRCITchJiZYISYbKEI=?=)o =?ISO-2022-JP?B?GyRCIlwbKEI=?=
=?ISO-2022-JP?B?GyRCIiglUSE8JXMbKEI=?= !!"
=?ISO-2022-JP?B?GyRCISMhJhsoQg==?= :*: =?ISO-2022-JP?B?GyRCISYbKEI=?=
=?ISO-2022-JP?B?GyRCISwheRsoQg==?=

I know that the characters " (", ")o ", " !!" and " :*: " should not be
there (at least, I think they shouldn't), I guess this is done by spam
bots to confuse spamassassin (not sure, though).

This check in the code (line 842 in camel-mime-utils.c) makes any such
string become the base64 one. This is of course really not readable for
normal human beings (although, it depends on what you call a normal
one).

842: .. /* quick check to see if this could possibly be a real encoded word */
843: ..  if (len < 8 || !(in[0] == '=' && in[1] == '?' && in[len-1] == '=' && 
in[len-2] == '?')) {
844: .. d(printf("invalid\n"));
845: .. return NULL;
846: ..  }

When just trying to decode the string, ignoring the check, it does work
quite well. At least for this case.

I'm attaching yet another patch that ignores this check.


On Thu, 2007-10-11 at 11:52 +0200, Philip Van Hoof wrote:
> In case iconv does not succeed in decoding for example the Subject
> header, it returns the base64 encoded one. That is is obviously not
> readable at all. The decoded one after base64 decoding (which did
> succeed in my test case) or whatever iconv could recover from it, sounds
> like a better option.
> 
> This changeset (patch) on camel-mime-utils.c deals with the error
> situation (in case iconv did not return -1) by returning what(ever)
> iconv could recover from the string:
> 
> http://tinymail.org/trac/tinymail/changeset/2830#file2
> 
> I attached:
> 
> svn diff libtinymail-camel/camel-lite/camel/camel-mime-utils.c -r 2829 > 
> /home/pvanhoof/diff.diff
> 
> This is the Subject line of my test target:
> 
> Subject: =?ISO-2022-JP?B?GyRCM048QiRLOkdEY0Z8NWsbKEI=?=
> 
> =?ISO-2022-JP?B?GyRCIzJLfDFfIUFGfEonJCQkRyQqRU8kN0NXJDckXiQ5ISMlYSE8GyhC?=
> =?ISO-2022-JP?B?GyRCJWslXiUsJTglcxsoQg==?=
> 
> 
> I also opened a bug for this one:
> http://bugzilla.gnome.org/show_bug.cgi?id=485677
> 
> 
> ___
> Evolution-hackers mailing list
> Evolution-hackers@gnome.org
> http://mail.gnome.org/mailman/listinfo/evolution-hackers
-- 
Philip Van Hoof, software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://www.pvanhoof.be/blog



Index: libtinymail-camel/camel-lite/camel/camel-mime-utils.c
===
--- libtinymail-camel/camel-lite/camel/camel-mime-utils.c	(revision 2830)
+++ libtinymail-camel/camel-lite/camel/camel-mime-utils.c	(working copy)
@@ -842,7 +842,7 @@
 	/* quick check to see if this could possibly be a real encoded word */
 	if (len < 8 || !(in[0] == '=' && in[1] == '?' && in[len-1] == '=' && in[len-2] == '?')) {
 		d(printf("invalid\n"));
-		return NULL;
+		/* return NULL; */
 	}
 	
 	/* skip past the charset to the encoding type */
___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


[Evolution-hackers] There's no need to be so hard on iconv

2007-10-11 Thread Philip Van Hoof
In case iconv does not succeed in decoding for example the Subject
header, it returns the base64 encoded one. That is is obviously not
readable at all. The decoded one after base64 decoding (which did
succeed in my test case) or whatever iconv could recover from it, sounds
like a better option.

This changeset (patch) on camel-mime-utils.c deals with the error
situation (in case iconv did not return -1) by returning what(ever)
iconv could recover from the string:

http://tinymail.org/trac/tinymail/changeset/2830#file2

I attached:

svn diff libtinymail-camel/camel-lite/camel/camel-mime-utils.c -r 2829 > 
/home/pvanhoof/diff.diff

This is the Subject line of my test target:

Subject: =?ISO-2022-JP?B?GyRCM048QiRLOkdEY0Z8NWsbKEI=?=

=?ISO-2022-JP?B?GyRCIzJLfDFfIUFGfEonJCQkRyQqRU8kN0NXJDckXiQ5ISMlYSE8GyhC?=
=?ISO-2022-JP?B?GyRCJWslXiUsJTglcxsoQg==?=


I also opened a bug for this one:
http://bugzilla.gnome.org/show_bug.cgi?id=485677


-- 
Philip Van Hoof, software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://www.pvanhoof.be/blog



Index: libtinymail-camel/camel-lite/camel/camel-mime-utils.c
===
--- libtinymail-camel/camel-lite/camel/camel-mime-utils.c	(revision 2829)
+++ libtinymail-camel/camel-lite/camel/camel-mime-utils.c	(working copy)
@@ -904,7 +904,14 @@
 	e_iconv (ic, NULL, 0, &outbuf, &outlen);
 	*outbuf = 0;
 	decoded = g_strdup (outbase);
+} else {
+	perror ("iconv");
+	e_iconv (ic, NULL, 0, &outbuf, &outlen);
+	*outbuf = 0;
+	decoded = g_strdup (outbase);
+	/* decoded = g_strdup (inbuf); */
 }
+
 e_iconv_close (ic);
 			} else {
 w(g_warning ("Cannot decode charset, header display may be corrupt: %s: %s",
___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers