Re: [Evolution-hackers] improved rfc2047 decode patch

2007-12-27 Thread Jeffrey Stedfast

On Thu, 2007-12-27 at 08:46 +0800, jacky wrote:
 --- Jeffrey Stedfast [EMAIL PROTECTED]wrote:
 
  
  On Thu, 2007-12-27 at 00:20 +0800, jacky wrote:
   It seem that your patch don't support this kind of
   encoded string:
  
 
 =?gb2312?b?any-encoded-text?==?gb2312?b?any-encoded-text?=
   Two encoded-words are not separated by any
  character.
  
  Are you sure? I wrote the code to be able to handle
  this case and I just
  tested it again (noticed that I didn't have a test
  case like this in my
  test suite so added one) and it works fine.
  
  Do you have an example subject/whatever header for
  me to test against?
  
 
 I make my conclusion too hastiness. Yes, your patch
 support this kind of email,

ok ;-)

  but it didn't support the
 email that break a single multi-byte character across
 multiple encoded-word tokens, and when it decode the
 header that break a encoded-word token across two
 lines, there is no result display on evolution, for
 example, the Subject is empty.

ok, just fixed this in svn. I had tested a broken UTF-8 header earlier
and so didn't see a slight bug in my code.

 I'll use Camle with your patch to check all email on
 my mbox  and use gmime to decode all email header to
 find out it's capacity.


Ok, awesome.

Jeff

___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] improved rfc2047 decode patch

2007-12-27 Thread Philip Van Hoof

These warnings might be unimportant. But fyi

==32055== 
==32055== Thread 3:
==32055== Conditional jump or move depends on uninitialised value(s)
==32055==at 0x4023BC7: memchr (mc_replace_strmem.c:354)
==32055==by 0x5054F8D: rfc2047_decode_word (camel-mime-utils.c:1060)
==32055==by 0x5057FA9: header_decode_mailbox (camel-mime-utils.c:2602)
==32055==by 0x50583AD: header_decode_address (camel-mime-utils.c:2727)
==32055==by 0x50589FB: camel_header_address_decode (camel-mime-utils.c:2997)
==32055==by 0x5043A26: internet_decode (camel-internet-address.c:91)
==32055==by 0x5039789: camel_address_decode (camel-address.c:129)
==32055==by 0x504E71A: process_header (camel-mime-message.c:708)
==32055==by 0x504E851: add_header (camel-mime-message.c:745)
==32055==by 0x504585C: camel_medium_add_header (camel-medium.c:145)
==32055==by 0x50533FB: construct_from_parser (camel-mime-part.c:963)
==32055==by 0x504E317: construct_from_parser (camel-mime-message.c:597)
==32055== 
==32055== Conditional jump or move depends on uninitialised value(s)
==32055==at 0x4023BD5: memchr (mc_replace_strmem.c:354)
==32055==by 0x5054F8D: rfc2047_decode_word (camel-mime-utils.c:1060)
==32055==by 0x5057FA9: header_decode_mailbox (camel-mime-utils.c:2602)
==32055==by 0x50583AD: header_decode_address (camel-mime-utils.c:2727)
==32055==by 0x50589FB: camel_header_address_decode (camel-mime-utils.c:2997)
==32055==by 0x5043A26: internet_decode (camel-internet-address.c:91)
==32055==by 0x5039789: camel_address_decode (camel-address.c:129)
==32055==by 0x504E71A: process_header (camel-mime-message.c:708)
==32055==by 0x504E851: add_header (camel-mime-message.c:745)
==32055==by 0x504585C: camel_medium_add_header (camel-medium.c:145)
==32055==by 0x50533FB: construct_from_parser (camel-mime-part.c:963)
==32055==by 0x504E317: construct_from_parser (camel-mime-message.c:597)
==32055== 
==32055== Invalid read of size 1
==32055==at 0x4023BD3: memchr (mc_replace_strmem.c:354)
==32055==by 0x5054F8D: rfc2047_decode_word (camel-mime-utils.c:1060)
==32055==by 0x5057FA9: header_decode_mailbox (camel-mime-utils.c:2602)
==32055==by 0x50583AD: header_decode_address (camel-mime-utils.c:2727)
==32055==by 0x50589FB: camel_header_address_decode (camel-mime-utils.c:2997)
==32055==by 0x5043A26: internet_decode (camel-internet-address.c:91)
==32055==by 0x5039789: camel_address_decode (camel-address.c:129)
==32055==by 0x504E71A: process_header (camel-mime-message.c:708)
==32055==by 0x504E851: add_header (camel-mime-message.c:745)
==32055==by 0x504585C: camel_medium_add_header (camel-medium.c:145)
==32055==by 0x50533FB: construct_from_parser (camel-mime-part.c:963)
==32055==by 0x504E317: construct_from_parser (camel-mime-message.c:597)
==32055==  Address 0x89ced8c is 0 bytes after a block of size 4 alloc'd
==32055==at 0x4022AB8: malloc (vg_replace_malloc.c:207)
==32055==by 0x4022BFC: realloc (vg_replace_malloc.c:429)
==32055==by 0x4A649BA: g_realloc (in /usr/lib/libglib-2.0.so.0.1400.1)
==32055==by 0x4A7DC7B: (within /usr/lib/libglib-2.0.so.0.1400.1)
==32055==by 0x4A7ECDC: g_string_sized_new (in 
/usr/lib/libglib-2.0.so.0.1400.1)
==32055==by 0x4A7ED24: g_string_new (in /usr/lib/libglib-2.0.so.0.1400.1)
==32055==by 0x5057BC7: header_decode_mailbox (camel-mime-utils.c:2480)
==32055==by 0x50583AD: header_decode_address (camel-mime-utils.c:2727)
==32055==by 0x50589FB: camel_header_address_decode (camel-mime-utils.c:2997)
==32055==by 0x5043A26: internet_decode (camel-internet-address.c:91)
==32055==by 0x5039789: camel_address_decode (camel-address.c:129)
==32055==by 0x504E71A: process_header (camel-mime-message.c:708)
==32055== 
==32055== Invalid read of size 1
==32055==at 0x5054F9D: rfc2047_decode_word (camel-mime-utils.c:1060)
==32055==by 0x5057FA9: header_decode_mailbox (camel-mime-utils.c:2602)
==32055==by 0x50583AD: header_decode_address (camel-mime-utils.c:2727)
==32055==by 0x50589FB: camel_header_address_decode (camel-mime-utils.c:2997)
==32055==by 0x5043A26: internet_decode (camel-internet-address.c:91)
==32055==by 0x5039789: camel_address_decode (camel-address.c:129)
==32055==by 0x504E71A: process_header (camel-mime-message.c:708)
==32055==by 0x504E851: add_header (camel-mime-message.c:745)
==32055==by 0x504585C: camel_medium_add_header (camel-medium.c:145)
==32055==by 0x50533FB: construct_from_parser (camel-mime-part.c:963)
==32055==by 0x504E317: construct_from_parser (camel-mime-message.c:597)
==32055==by 0x50534DA: camel_mime_part_construct_from_parser 
(camel-mime-part.c:995)



On Thu, 2007-12-27 at 08:25 -0500, Jeffrey Stedfast wrote:
 On Thu, 2007-12-27 at 08:46 +0800, jacky wrote:
  --- Jeffrey Stedfast [EMAIL PROTECTED]wrote:
  
   
   On Thu, 2007-12-27 at 00:20 +0800, jacky wrote:
It seem that your patch don't support this 

Re: [Evolution-hackers] improved rfc2047 decode patch

2007-12-27 Thread Jeffrey Stedfast
Thanks for the info, I'll have a closer look when I get back from
holiday. My guess is that Conditional jumps can safely be ignored as
they are probably due to some code optimization hack inside memchr() but
the others I'm not sure about.

I've committed a possible fix for them (only bug I could think of at
first glance) which checks that inbuf is = 8 bytes since that is the
minimum size of an rfc2047 encoded-word token, less than that and it's
not an enc-word for sure.

Either way, my fix is a nice sanity check to have anyway.

Jeff


On Thu, 2007-12-27 at 21:52 +0100, Philip Van Hoof wrote:
 These warnings might be unimportant. But fyi
 
 ==32055== 
 ==32055== Thread 3:
 ==32055== Conditional jump or move depends on uninitialised value(s)
 ==32055==at 0x4023BC7: memchr (mc_replace_strmem.c:354)
 ==32055==by 0x5054F8D: rfc2047_decode_word (camel-mime-utils.c:1060)
 ==32055==by 0x5057FA9: header_decode_mailbox (camel-mime-utils.c:2602)
 ==32055==by 0x50583AD: header_decode_address (camel-mime-utils.c:2727)
 ==32055==by 0x50589FB: camel_header_address_decode 
 (camel-mime-utils.c:2997)
 ==32055==by 0x5043A26: internet_decode (camel-internet-address.c:91)
 ==32055==by 0x5039789: camel_address_decode (camel-address.c:129)
 ==32055==by 0x504E71A: process_header (camel-mime-message.c:708)
 ==32055==by 0x504E851: add_header (camel-mime-message.c:745)
 ==32055==by 0x504585C: camel_medium_add_header (camel-medium.c:145)
 ==32055==by 0x50533FB: construct_from_parser (camel-mime-part.c:963)
 ==32055==by 0x504E317: construct_from_parser (camel-mime-message.c:597)
 ==32055== 
 ==32055== Conditional jump or move depends on uninitialised value(s)
 ==32055==at 0x4023BD5: memchr (mc_replace_strmem.c:354)
 ==32055==by 0x5054F8D: rfc2047_decode_word (camel-mime-utils.c:1060)
 ==32055==by 0x5057FA9: header_decode_mailbox (camel-mime-utils.c:2602)
 ==32055==by 0x50583AD: header_decode_address (camel-mime-utils.c:2727)
 ==32055==by 0x50589FB: camel_header_address_decode 
 (camel-mime-utils.c:2997)
 ==32055==by 0x5043A26: internet_decode (camel-internet-address.c:91)
 ==32055==by 0x5039789: camel_address_decode (camel-address.c:129)
 ==32055==by 0x504E71A: process_header (camel-mime-message.c:708)
 ==32055==by 0x504E851: add_header (camel-mime-message.c:745)
 ==32055==by 0x504585C: camel_medium_add_header (camel-medium.c:145)
 ==32055==by 0x50533FB: construct_from_parser (camel-mime-part.c:963)
 ==32055==by 0x504E317: construct_from_parser (camel-mime-message.c:597)
 ==32055== 
 ==32055== Invalid read of size 1
 ==32055==at 0x4023BD3: memchr (mc_replace_strmem.c:354)
 ==32055==by 0x5054F8D: rfc2047_decode_word (camel-mime-utils.c:1060)
 ==32055==by 0x5057FA9: header_decode_mailbox (camel-mime-utils.c:2602)
 ==32055==by 0x50583AD: header_decode_address (camel-mime-utils.c:2727)
 ==32055==by 0x50589FB: camel_header_address_decode 
 (camel-mime-utils.c:2997)
 ==32055==by 0x5043A26: internet_decode (camel-internet-address.c:91)
 ==32055==by 0x5039789: camel_address_decode (camel-address.c:129)
 ==32055==by 0x504E71A: process_header (camel-mime-message.c:708)
 ==32055==by 0x504E851: add_header (camel-mime-message.c:745)
 ==32055==by 0x504585C: camel_medium_add_header (camel-medium.c:145)
 ==32055==by 0x50533FB: construct_from_parser (camel-mime-part.c:963)
 ==32055==by 0x504E317: construct_from_parser (camel-mime-message.c:597)
 ==32055==  Address 0x89ced8c is 0 bytes after a block of size 4 alloc'd
 ==32055==at 0x4022AB8: malloc (vg_replace_malloc.c:207)
 ==32055==by 0x4022BFC: realloc (vg_replace_malloc.c:429)
 ==32055==by 0x4A649BA: g_realloc (in /usr/lib/libglib-2.0.so.0.1400.1)
 ==32055==by 0x4A7DC7B: (within /usr/lib/libglib-2.0.so.0.1400.1)
 ==32055==by 0x4A7ECDC: g_string_sized_new (in 
 /usr/lib/libglib-2.0.so.0.1400.1)
 ==32055==by 0x4A7ED24: g_string_new (in /usr/lib/libglib-2.0.so.0.1400.1)
 ==32055==by 0x5057BC7: header_decode_mailbox (camel-mime-utils.c:2480)
 ==32055==by 0x50583AD: header_decode_address (camel-mime-utils.c:2727)
 ==32055==by 0x50589FB: camel_header_address_decode 
 (camel-mime-utils.c:2997)
 ==32055==by 0x5043A26: internet_decode (camel-internet-address.c:91)
 ==32055==by 0x5039789: camel_address_decode (camel-address.c:129)
 ==32055==by 0x504E71A: process_header (camel-mime-message.c:708)
 ==32055== 
 ==32055== Invalid read of size 1
 ==32055==at 0x5054F9D: rfc2047_decode_word (camel-mime-utils.c:1060)
 ==32055==by 0x5057FA9: header_decode_mailbox (camel-mime-utils.c:2602)
 ==32055==by 0x50583AD: header_decode_address (camel-mime-utils.c:2727)
 ==32055==by 0x50589FB: camel_header_address_decode 
 (camel-mime-utils.c:2997)
 ==32055==by 0x5043A26: internet_decode (camel-internet-address.c:91)
 ==32055==by 0x5039789: camel_address_decode (camel-address.c:129)
 ==32055==by 0x504E71A: 

Re: [Evolution-hackers] improved rfc2047 decode patch

2007-12-26 Thread jacky
It seem that your patch don't support this kind of
encoded string:
=?gb2312?b?any-encoded-text?==?gb2312?b?any-encoded-text?=
Two encoded-words are not separated by any character.

--- Jeffrey Stedfast [EMAIL PROTECTED]wrote:

 This patch is a port of my GMime rfc2047 decoder
 which is even more
 liberal in what it accepts than Thunderbird and is
 what I will be
 committing to svn.
 
 closing bugs:
 
 #302991
 #315513
 #502178
 
 Jeff
 
  Index: camel-mime-utils.c

===
 --- camel-mime-utils.c(revision 8315)
 +++ camel-mime-utils.c(working copy)
 @@ -821,116 +821,321 @@
   *in = inptr;
  }
  
 -/* decode rfc 2047 encoded string segment */
  static char *
 -rfc2047_decode_word(const char *in, size_t len)
 +camel_iconv_strndup (iconv_t cd, const char
 *string, size_t n)
  {
 - const char *inptr = in+2;
 - const char *inend = in+len-2;
 + size_t inleft, outleft, converted = 0;
 + char *out, *outbuf;
   const char *inbuf;
 - const char *charset;
 - char *encname, *p;
 - int tmplen;
 - size_t ret;
 - char *decword = NULL;
 - char *decoded = NULL;
 - char *outbase = NULL;
 - char *outbuf;
 - size_t inlen, outlen;
 - gboolean retried = FALSE;
 - iconv_t ic;
 -
 - d(printf(rfc2047: decoding '%.*s'\n, len, in));
 -
 - /* quick check to see if this could possibly be a
 real encoded word */
 - if (len  8 || !(in[0] == '='  in[1] == '?' 
 in[len-1] == '='  in[len-2] == '?')) {
 - d(printf(invalid\n));
 - return NULL;
 - }
 -
 - /* skip past the charset to the encoding type */
 - inptr = memchr (inptr, '?', inend-inptr);
 - if (inptr != NULL  inptr  inend + 2  inptr[2]
 == '?') {
 - d(printf(found ?, encoding is '%c'\n,
 inptr[0]));
 - inptr++;
 - tmplen = inend-inptr-2;
 - decword = g_alloca (tmplen); /* this will always
 be more-than-enough room */
 - switch(toupper(inptr[0])) {
 - case 'Q':
 - inlen = quoted_decode((const unsigned char *)
 inptr+2, tmplen, (unsigned char *) decword);
 - break;
 - case 'B': {
 - int state = 0;
 - unsigned int save = 0;
 -
 - inlen = camel_base64_decode_step((unsigned char
 *) inptr+2, tmplen, (unsigned char *) decword,
 state, save);
 - /* if state != 0 then error? */
 - break;
 + size_t outlen;
 + int errnosav;
 + 
 + if (cd == (iconv_t) -1)
 + return g_strndup (string, n);
 + 
 + outlen = n * 2 + 16;
 + out = g_malloc (outlen + 4);
 + 
 + inbuf = string;
 + inleft = n;
 + 
 + do {
 + errno = 0;
 + outbuf = out + converted;
 + outleft = outlen - converted;
 + 
 + converted = iconv (cd, (char **) inbuf, inleft,
 outbuf, outleft);
 + if (converted == (size_t) -1) {
 + if (errno != E2BIG  errno != EINVAL)
 + goto fail;
   }
 - default:
 - /* uhhh, unknown encoding type - probably an
 invalid encoded word string */
 - return NULL;
 + 
 + /*
 +  * E2BIG   There is not sufficient room at
 *outbuf.
 +  *
 +  * We just need to grow our outbuffer and try
 again.
 +  */
 + 
 + converted = outbuf - out;
 + if (errno == E2BIG) {
 + outlen += inleft * 2 + 16;
 + out = g_realloc (out, outlen + 4);
 + outbuf = out + converted;
   }
 - d(printf(The encoded length = %d\n, inlen));
 - if (inlen  0) {
 - /* yuck, all this snot is to setup iconv! */
 - tmplen = inptr - in - 3;
 - encname = g_alloca (tmplen + 1);
 - memcpy (encname, in + 2, tmplen);
 - encname[tmplen] = '\0';
 + } while (errno == E2BIG  inleft  0);
 + 
 + /*
 +  * EINVAL  An  incomplete  multibyte sequence has
 been encoun
 +  * tered in the input.
 +  *
 +  * We'll just have to ignore it...
 +  */
 + 
 + /* flush the iconv conversion */
 + iconv (cd, NULL, NULL, outbuf, outleft);
 + 
 + /* Note: not all charsets can be nul-terminated
 with a single
 +   nul byte. UCS2, for example, needs 2 nul
 bytes and UCS4
 +   needs 4. I hope that 4 nul bytes is
 enough to terminate all
 +   multibyte charsets? */
 + 
 + /* nul-terminate the string */
 + memset (outbuf, 0, 4);
 + 
 + /* reset the cd */
 + iconv (cd, NULL, NULL, NULL, NULL);
 + 
 + return out;
 + 
 + fail:
 +

Re: [Evolution-hackers] improved rfc2047 decode patch

2007-12-26 Thread Jeffrey Stedfast

On Thu, 2007-12-27 at 00:20 +0800, jacky wrote:
 It seem that your patch don't support this kind of
 encoded string:
 =?gb2312?b?any-encoded-text?==?gb2312?b?any-encoded-text?=
 Two encoded-words are not separated by any character.

Are you sure? I wrote the code to be able to handle this case and I just
tested it again (noticed that I didn't have a test case like this in my
test suite so added one) and it works fine.

Do you have an example subject/whatever header for me to test against?

Jeff

 
 --- Jeffrey Stedfast [EMAIL PROTECTED]wrote:
 
  This patch is a port of my GMime rfc2047 decoder
  which is even more
  liberal in what it accepts than Thunderbird and is
  what I will be
  committing to svn.
  
  closing bugs:
  
  #302991
  #315513
  #502178
  
  Jeff


___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] improved rfc2047 decode patch

2007-12-26 Thread jacky

--- Jeffrey Stedfast [EMAIL PROTECTED]wrote:

 
 On Thu, 2007-12-27 at 00:20 +0800, jacky wrote:
  It seem that your patch don't support this kind of
  encoded string:
 

=?gb2312?b?any-encoded-text?==?gb2312?b?any-encoded-text?=
  Two encoded-words are not separated by any
 character.
 
 Are you sure? I wrote the code to be able to handle
 this case and I just
 tested it again (noticed that I didn't have a test
 case like this in my
 test suite so added one) and it works fine.
 
 Do you have an example subject/whatever header for
 me to test against?
 

I make my conclusion too hastiness. Yes, your patch
support this kind of email, but it didn't support the
email that break a single multi-byte character across
multiple encoded-word tokens, and when it decode the
header that break a encoded-word token across two
lines, there is no result display on evolution, for
example, the Subject is empty.
I'll use Camle with your patch to check all email on
my mbox  and use gmime to decode all email header to
find out it's capacity.

 Jeff
 
  
  --- Jeffrey Stedfast [EMAIL PROTECTED]wrote:
  
   This patch is a port of my GMime rfc2047 decoder
   which is even more
   liberal in what it accepts than Thunderbird and
 is
   what I will be
   committing to svn.
   
   closing bugs:
   
   #302991
   #315513
   #502178
   
   Jeff
 
 
 



  ___ 
雅虎邮箱传递新年祝福,个性贺卡送亲朋! 
http://cn.mail.yahoo.com/gc/index.html?entry=5souce=mail_mailletter_tagline
___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] improved rfc2047 decode patch

2007-12-25 Thread Philip Van Hoof

Awesome! In the afternoon I started with the exact same port, but had to
pause because of family visiting, I'm back home and you have it
finished :). Thanks a lot!

Brought it to tny's camel. FYI:
http://tinymail.org/trac/tinymail/changeset/3203


On Tue, 2007-12-25 at 19:28 -0500, Jeffrey Stedfast wrote:
 This patch is a port of my GMime rfc2047 decoder which is even more
 liberal in what it accepts than Thunderbird and is what I will be
 committing to svn.
 
 closing bugs:
 
 #302991
 #315513
 #502178
 
 Jeff
 
 ___
 Evolution-hackers mailing list
 Evolution-hackers@gnome.org
 http://mail.gnome.org/mailman/listinfo/evolution-hackers
-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] improved rfc2047 decode patch

2007-12-25 Thread Jeffrey Stedfast
I noticed that the

while (camel_mime_is_dtext(*inptr)  *inptr)

got reversed in your camel-lite patch, which must mean that you had
locally changed the code to do the *inptr check first...

I think your change was correct, we should be checking *inptr first
before passing it to camel_mime_is_dtext(), so I've made that change to
upstream camel.

it likely won't make a difference, but it saves a few instructions by
avoiding an unnecessary lookup (*inptr can't be valid dtext if *inptr is
nul)

Also, you'll want to update camel-charset-map-private.h in your
camel-lite branch or the update to camel-charset-map.c doesn't actually
get you anything (that change is really only for auto-generating
camel-charset-map-private.h)

Which reminds me... I need to commit jacky's e-iconv change as well.

Jeff


On Wed, 2007-12-26 at 01:41 +0100, Philip Van Hoof wrote:
 Awesome! In the afternoon I started with the exact same port, but had to
 pause because of family visiting, I'm back home and you have it
 finished :). Thanks a lot!
 
 Brought it to tny's camel. FYI:
 http://tinymail.org/trac/tinymail/changeset/3203
 
 
 On Tue, 2007-12-25 at 19:28 -0500, Jeffrey Stedfast wrote:
  This patch is a port of my GMime rfc2047 decoder which is even more
  liberal in what it accepts than Thunderbird and is what I will be
  committing to svn.
  
  closing bugs:
  
  #302991
  #315513
  #502178
  
  Jeff
  
  ___
  Evolution-hackers mailing list
  Evolution-hackers@gnome.org
  http://mail.gnome.org/mailman/listinfo/evolution-hackers

___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers