Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJK header

2007-12-25 Thread Jeffrey Stedfast
FWIW, even Thunderbird doesn't handle multi-byte characters split across
multiple encoded-word tokens. I was just checking out their
implementation in

 mozilla/netwerk/src/nsMIMEHeaderParamImpl.cpp:DecodeRFC2047Str()

Jeff

On Sun, 2007-12-23 at 23:09 +0800, jacky wrote:
 Hi, all.
 
 The rfc2047 decoder in libcamel can not decode some
 CJK header correctly. Although some of them are not
 correspond to RFC, but I need to decode it correctly
 and I thought if evolution can display there email
 correctly more people like it.
 
 So I write a new rfc2047 decoder, and it's in the
 patch. With the patch, libcamel can decode CJK header
 correctly and evolution can display CJK header
 correctly now. I had test it in my mailbox. My mailbox
 has 2000 emails which were sent by evolution,
 thunderbird, outlook, outlook express, foxmail, open
 webmail, yahoo, gmail, lotus notes, etc. Without this
 patch, almost 20% of there emails can't be decoded and
 displayed correctly, with this patch, 99% of there
 emails can be decoded and displayed correctly.
 
 And I found that the attachment with CJK name can't be
 recognised and displayed by outlook / outlook express
 / foxmail. This is because there email clients do not
 support RFC2184. Evolution always use RFC2184 encode
 mothod to encode attachment name, so the email with
 CJK named attachment can't display in outlook /
 outlook express / foxmail. In thunderbird, you can set
 the option mail.strictly_mime.parm_folding to 0 or 1
 for using RFC2047 encode mothod to encode attachment
 name. Can we add a similar option?
 
 Best regards.
 
 
   ___ 
 雅虎邮箱传递新年祝福,个性贺卡送亲朋! 
 http://cn.mail.yahoo.com/gc/index.html?entry=5souce=mail_mailletter_tagline
 ___ Evolution-hackers mailing 
 list Evolution-hackers@gnome.org 
 http://mail.gnome.org/mailman/listinfo/evolution-hackers

___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJK header

2007-12-24 Thread Jeffrey Stedfast
Hi Jacky,

I've looked over your patch, but unfortunately it is unusable. The patch
is riddled with buffer overflows and incorrect logic.

What types of bugs are you actually trying to fix? What is it about CJK
messages in particular that are not getting decoded properly? Your email
was overly vague.

Your changes to e-iconv can probably be taken if I understand correctly
that GBK is a superset of gb2312 ( http://en.wikipedia.org/wiki/GBK ),
altho it would have been nice to have gotten some sort of link
explaining that with your original email (or via a ChangeLog entry) :)

Thanks,

Jeff

On Sun, 2007-12-23 at 23:09 +0800, jacky wrote:
 Hi, all.
 
 The rfc2047 decoder in libcamel can not decode some
 CJK header correctly. Although some of them are not
 correspond to RFC, but I need to decode it correctly
 and I thought if evolution can display there email
 correctly more people like it.
 
 So I write a new rfc2047 decoder, and it's in the
 patch. With the patch, libcamel can decode CJK header
 correctly and evolution can display CJK header
 correctly now. I had test it in my mailbox. My mailbox
 has 2000 emails which were sent by evolution,
 thunderbird, outlook, outlook express, foxmail, open
 webmail, yahoo, gmail, lotus notes, etc. Without this
 patch, almost 20% of there emails can't be decoded and
 displayed correctly, with this patch, 99% of there
 emails can be decoded and displayed correctly.
 
 And I found that the attachment with CJK name can't be
 recognised and displayed by outlook / outlook express
 / foxmail. This is because there email clients do not
 support RFC2184. Evolution always use RFC2184 encode
 mothod to encode attachment name, so the email with
 CJK named attachment can't display in outlook /
 outlook express / foxmail. In thunderbird, you can set
 the option mail.strictly_mime.parm_folding to 0 or 1
 for using RFC2047 encode mothod to encode attachment
 name. Can we add a similar option?
 
 Best regards.
 
 
   ___ 
 雅虎邮箱传递新年祝福,个性贺卡送亲朋! 
 http://cn.mail.yahoo.com/gc/index.html?entry=5souce=mail_mailletter_tagline
 ___ Evolution-hackers mailing 
 list Evolution-hackers@gnome.org 
 http://mail.gnome.org/mailman/listinfo/evolution-hackers

___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJK header

2007-12-23 Thread Philip Van Hoof
Hey Jacky,

This is a port of your patch to Tinymail's camel-lite

On Sun, 2007-12-23 at 23:09 +0800, jacky wrote:
 Hi, all.
 
 The rfc2047 decoder in libcamel can not decode some
 CJK header correctly. Although some of them are not
 correspond to RFC, but I need to decode it correctly
 and I thought if evolution can display there email
 correctly more people like it.
 
 So I write a new rfc2047 decoder, and it's in the
 patch. With the patch, libcamel can decode CJK header
 correctly and evolution can display CJK header
 correctly now. I had test it in my mailbox. My mailbox
 has 2000 emails which were sent by evolution,
 thunderbird, outlook, outlook express, foxmail, open
 webmail, yahoo, gmail, lotus notes, etc. Without this
 patch, almost 20% of there emails can't be decoded and
 displayed correctly, with this patch, 99% of there
 emails can be decoded and displayed correctly.
 
 And I found that the attachment with CJK name can't be
 recognised and displayed by outlook / outlook express
 / foxmail. This is because there email clients do not
 support RFC2184. Evolution always use RFC2184 encode
 mothod to encode attachment name, so the email with
 CJK named attachment can't display in outlook /
 outlook express / foxmail. In thunderbird, you can set
 the option mail.strictly_mime.parm_folding to 0 or 1
 for using RFC2047 encode mothod to encode attachment
 name. Can we add a similar option?
 
 Best regards.
 
 
   ___ 
 雅虎邮箱传递新年祝福,个性贺卡送亲朋! 
 http://cn.mail.yahoo.com/gc/index.html?entry=5souce=mail_mailletter_tagline
 ___ Evolution-hackers mailing 
 list Evolution-hackers@gnome.org 
 http://mail.gnome.org/mailman/listinfo/evolution-hackers
-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be



Index: libtinymail-camel/camel-lite/camel/camel-mime-utils.c
===
--- libtinymail-camel/camel-lite/camel/camel-mime-utils.c	(revision 3190)
+++ libtinymail-camel/camel-lite/camel/camel-mime-utils.c	(working copy)
@@ -821,125 +821,207 @@
 	*in = inptr;
 }
 
+static void
+print_hex (unsigned char *data, size_t len)
+{
+	size_t i, x;
+	unsigned char *p = data;
+	char high, low;
+
+	x = 0;
+	printf (%04u, x);
+	for (i = 0; i  len; i++) {
+		high = *p  4;
+		high = (high10) ? high + '0' : high + 'a' - 10;
+
+		low = *p  0x0f;
+		low = (low10) ? low + '0' : low + 'a' - 10;
+
+		printf (0x%c%c  , high, low);
+
+		p++;
+		x++;
+		if (i % 8 == 7) {
+			printf (\n%04u, x);
+		}
+	}
+	printf (\n);
+}
+
+static size_t
+conv_to_utf8 (const char *encname, char *in, size_t inlen, char *out, size_t outlen)
+{
+	char *charset, *inbuf, *outbuf;
+	iconv_t ic;
+	size_t inbuf_len, outbuf_len, ret;
+
+	charset = (char *) e_iconv_charset_name (encname);
+
+	ic = e_iconv_open (UTF-8, charset);
+	if (ic == (iconv_t) -1) {
+		printf (e_iconv_open() error\n);
+		return (size_t)-1;
+	}
+
+	inbuf = in;
+	inbuf_len = inlen;
+
+	outbuf = out;
+	outbuf_len = outlen;
+
+	ret = e_iconv (ic, (const char **) inbuf, inbuf_len, outbuf, outbuf_len);
+	if (ret == (size_t)-1) {
+		printf (e_iconv() error! source charset is %s, target charset is %s\n, charset, UTF-8);
+		printf (converted %u bytes, but last %u bytes can't convert!!\n, inlen - inbuf_len, inbuf_len);
+		printf (source data:\n);
+		print_hex (in, inlen);
+
+		*outbuf = '\0';
+		printf (target string is \%s\\n, out);
+
+		return (size_t)-1;
+	}
+
+	ret = outlen - outbuf_len;
+	out[ret] = '\0';
+
+	e_iconv_close (ic);
+
+	return ret;
+}
+
 /* decode rfc 2047 encoded string segment */
+#define DECWORD_LEN 1024
+#define UTF8_DECWORD_LEN 2048
+
 static char *
 rfc2047_decode_word(const char *in, size_t len)
 {
-	const char *inptr = in+2;
-	const char *inend = in+len-2;
-	const char *inbuf;
-	const char *charset;
-	char *encname, *p;
-	int tmplen;
-	size_t ret;
-	char *decword = NULL;
-	char *decoded = NULL;
-	char *outbase = NULL;
-	char *outbuf;
-	size_t inlen, outlen;
-	gboolean retried = FALSE;
-	iconv_t ic;
-	int idx = 0;
+	char prev_charset[32], curr_charset[32];
+	char encode;
+	char *start, *inptr, *inend;
+	char decword[DECWORD_LEN], utf8_decword[UTF8_DECWORD_LEN];
+	char *decword_ptr, *utf8_decword_ptr;
+	size_t inlen, outlen, ret;
 
 	d(printf(rfc2047: decoding '%.*s'\n, len, in));
 
+	prev_charset[0] = curr_charset[0] = '\0';
+
+	decword_ptr = decword;
+	utf8_decword_ptr = utf8_decword;
+
 	/* quick check to see if this could possibly be a real encoded word */
-
-	if (len  8 || !(in[0] == '='  in[1] == '?')) {
+	if (len  8
+	|| !(in[0] == '='  in[1] == '?'
+		  in[len-1] == '='  in[len-2] == '?')) {
 		d(printf(invalid\n));
 		return NULL;
 	}
 
-	/* skip past the charset to the encoding type */
-	inptr = memchr (inptr, '?', inend-inptr);
-	if (inptr != NULL  inptr  inend + 2  inptr[2] == '?') {
-		

Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJK header

2007-12-23 Thread jacky

--- Philip Van Hoof [EMAIL PROTECTED]wrote:

 Hey Jacky,
 
 This is a port of your patch to Tinymail's
 camel-lite
 

Thank you.


 On Sun, 2007-12-23 at 23:09 +0800, jacky wrote:
  Hi, all.
  
  The rfc2047 decoder in libcamel can not decode
 some
  CJK header correctly. Although some of them are
 not
  correspond to RFC, but I need to decode it
 correctly
  and I thought if evolution can display there email
  correctly more people like it.
  
  So I write a new rfc2047 decoder, and it's in the
  patch. With the patch, libcamel can decode CJK
 header
  correctly and evolution can display CJK header
  correctly now. I had test it in my mailbox. My
 mailbox
  has 2000 emails which were sent by evolution,
  thunderbird, outlook, outlook express, foxmail,
 open
  webmail, yahoo, gmail, lotus notes, etc. Without
 this
  patch, almost 20% of there emails can't be decoded
 and
  displayed correctly, with this patch, 99% of there
  emails can be decoded and displayed correctly.
  
  And I found that the attachment with CJK name
 can't be
  recognised and displayed by outlook / outlook
 express
  / foxmail. This is because there email clients do
 not
  support RFC2184. Evolution always use RFC2184
 encode
  mothod to encode attachment name, so the email
 with
  CJK named attachment can't display in outlook /
  outlook express / foxmail. In thunderbird, you can
 set
  the option mail.strictly_mime.parm_folding to 0
 or 1
  for using RFC2047 encode mothod to encode
 attachment
  name. Can we add a similar option?
  
  Best regards.
  
  
   

___
 
  雅虎邮箱传递新年祝福,个性贺卡送亲朋! 
 

http://cn.mail.yahoo.com/gc/index.html?entry=5souce=mail_mailletter_tagline
  ___
 Evolution-hackers mailing list
 Evolution-hackers@gnome.org

http://mail.gnome.org/mailman/listinfo/evolution-hackers
 -- 
 Philip Van Hoof, freelance software developer
 home: me at pvanhoof dot be 
 gnome: pvanhoof at gnome dot org 
 http://pvanhoof.be/blog
 http://codeminded.be
 
 
 
  Index:

libtinymail-camel/camel-lite/camel/camel-mime-utils.c

===
 ---

libtinymail-camel/camel-lite/camel/camel-mime-utils.c
 (revision 3190)
 +++

libtinymail-camel/camel-lite/camel/camel-mime-utils.c
 (working copy)
 @@ -821,125 +821,207 @@
   *in = inptr;
  }
  
 +static void
 +print_hex (unsigned char *data, size_t len)
 +{
 + size_t i, x;
 + unsigned char *p = data;
 + char high, low;
 +
 + x = 0;
 + printf (%04u, x);
 + for (i = 0; i  len; i++) {
 + high = *p  4;
 + high = (high10) ? high + '0' : high + 'a' - 10;
 +
 + low = *p  0x0f;
 + low = (low10) ? low + '0' : low + 'a' - 10;
 +
 + printf (0x%c%c  , high, low);
 +
 + p++;
 + x++;
 + if (i % 8 == 7) {
 + printf (\n%04u, x);
 + }
 + }
 + printf (\n);
 +}
 +
 +static size_t
 +conv_to_utf8 (const char *encname, char *in, size_t
 inlen, char *out, size_t outlen)
 +{
 + char *charset, *inbuf, *outbuf;
 + iconv_t ic;
 + size_t inbuf_len, outbuf_len, ret;
 +
 + charset = (char *) e_iconv_charset_name (encname);
 +
 + ic = e_iconv_open (UTF-8, charset);
 + if (ic == (iconv_t) -1) {
 + printf (e_iconv_open() error\n);
 + return (size_t)-1;
 + }
 +
 + inbuf = in;
 + inbuf_len = inlen;
 +
 + outbuf = out;
 + outbuf_len = outlen;
 +
 + ret = e_iconv (ic, (const char **) inbuf,
 inbuf_len, outbuf, outbuf_len);
 + if (ret == (size_t)-1) {
 + printf (e_iconv() error! source charset is %s,
 target charset is %s\n, charset, UTF-8);
 + printf (converted %u bytes, but last %u bytes
 can't convert!!\n, inlen - inbuf_len, inbuf_len);
 + printf (source data:\n);
 + print_hex (in, inlen);
 +
 + *outbuf = '\0';
 + printf (target string is \%s\\n, out);
 +
 + return (size_t)-1;
 + }
 +
 + ret = outlen - outbuf_len;
 + out[ret] = '\0';
 +
 + e_iconv_close (ic);
 +
 + return ret;
 +}
 +
  /* decode rfc 2047 encoded string segment */
 +#define DECWORD_LEN 1024
 +#define UTF8_DECWORD_LEN 2048
 +
  static char *
  rfc2047_decode_word(const char *in, size_t len)
  {
 - const char *inptr = in+2;
 - const char *inend = in+len-2;
 - const char *inbuf;
 - const char *charset;
 - char *encname, *p;
 - int tmplen;
 - size_t ret;
 - char *decword = NULL;
 - char *decoded = NULL;
 - char *outbase = NULL;
 - char *outbuf;
 - size_t inlen, outlen;
 - gboolean retried = FALSE;
 - iconv_t ic;
 - int idx = 0;
 + char prev_charset[32], curr_charset[32];
 + char encode;
 + char *start, *inptr, *inend;
 + char decword[DECWORD_LEN],
 utf8_decword[UTF8_DECWORD_LEN];
 + char