Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader
On Tue, 2007-12-25 at 15:56 +0800, jacky wrote: > But the problem describe below has not been solved. > > 1) An encoded-word was divided into two line. This > was > > sent by dotProject v2.0.1 . > > As I seen this kind of email use quoted encode only, > and header_decode_text() can get all encoded-words > which is separated by SPACE, a simple solution is > replace SPACE with '_'. In fact OpenWebmail do like > this. > But the problem is I must change the prototype of > header_decode_text() to > char *header_decode_text (char *in, size_t inlen, int > ctext, const char *default_charset) > Originality, it is > char *header_decode_text (const char *in, size_t > inlen, int ctext, const char *default_charset) > Functions which call header_decode_text() must been > changed too. > Does anyone have better proposal? It would be a bad idea to have header_decode_text() modify the original string. I highly suggest starting with GMime's g_mime_utils_header_decode_text() implementation because it handles all the problems you've described. Jeff ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader
--- jacky <[EMAIL PROTECTED]>worte: > > --- Peter Volkov <[EMAIL PROTECTED]>wrote: > > > > > В Пнд, 24/12/2007 в 13:21 +0800, jacky > > пишет: > > > --- Jeff Stedfast <[EMAIL PROTECTED]>wrote: > > > There are two kind of email need to support: > > > 1) An encoded-word was divided into two line. > This > > was > > > sent by dotProject v2.0.1 . > > > > And there are even more users affected by this. > I've > > already reported > > similar problem in bug 315513. Thus this affects > not > > only CJK people: > > > > http://bugzilla.gnome.org/show_bug.cgi?id=315513 > > > > In fact, the parser and decoder in my patch support > this encoded-words. I already mentioned in my email: > > 2) A CJK character's encoded string must in an > > encoded-word, but some email client divide it into > two > > encoded-words. > > But the problem describe below has not been solved. > > 1) An encoded-word was divided into two line. This > was > > sent by dotProject v2.0.1 . > > As I seen this kind of email use quoted encode only, > and header_decode_text() can get all encoded-words > which is separated by SPACE, a simple solution is CORRECTION: header_decode_text() can get whole encoded-word which is separated by SPACE Sorry for my poor English. An example: =?utf-8?Q?=E7=B3=BB=E7=BB=9F=E7=A0=94=E7=A9=B6 =E5=B7=B2=E6=9B=B4=E6=96=B0?= > replace SPACE with '_'. In fact OpenWebmail do like > this. > But the problem is I must change the prototype of > header_decode_text() to > char *header_decode_text (char *in, size_t inlen, > int > ctext, const char *default_charset) > Originality, it is > char *header_decode_text (const char *in, size_t > inlen, int ctext, const char *default_charset) > Functions which call header_decode_text() must been > changed too. > Does anyone have better proposal? > > > -- > > Peter. > > > ___ 雅虎邮箱传递新年祝福,个性贺卡送亲朋! http://cn.mail.yahoo.com/gc/index.html?entry=5&souce=mail_mailletter_tagline ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader
--- Peter Volkov <[EMAIL PROTECTED]>wrote: > > В Пнд, 24/12/2007 в 13:21 +0800, jacky > пишет: > > --- Jeff Stedfast <[EMAIL PROTECTED]>wrote: > > There are two kind of email need to support: > > 1) An encoded-word was divided into two line. This > was > > sent by dotProject v2.0.1 . > > And there are even more users affected by this. I've > already reported > similar problem in bug 315513. Thus this affects not > only CJK people: > > http://bugzilla.gnome.org/show_bug.cgi?id=315513 > In fact, the parser and decoder in my patch support this encoded-words. I already mentioned in my email: > 2) A CJK character's encoded string must in an > encoded-word, but some email client divide it into two > encoded-words. But the problem describe below has not been solved. > 1) An encoded-word was divided into two line. This was > sent by dotProject v2.0.1 . As I seen this kind of email use quoted encode only, and header_decode_text() can get all encoded-words which is separated by SPACE, a simple solution is replace SPACE with '_'. In fact OpenWebmail do like this. But the problem is I must change the prototype of header_decode_text() to char *header_decode_text (char *in, size_t inlen, int ctext, const char *default_charset) Originality, it is char *header_decode_text (const char *in, size_t inlen, int ctext, const char *default_charset) Functions which call header_decode_text() must been changed too. Does anyone have better proposal? > -- > Peter. > ___ 雅虎邮箱传递新年祝福,个性贺卡送亲朋! http://cn.mail.yahoo.com/gc/index.html?entry=5&souce=mail_mailletter_tagline ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader
On Mon, 2007-12-24 at 13:21 +0800, jacky wrote: > Yes, I use a fixed length string to store some value, it maybe > overflow. I write another version by using heap insteads of > stack. I think the stack version is simple and enough, so I > send it only. Two version of rfc2047_decode_word() is in > attachment. Can you explain the incorrect logic in my patch? Just doing decword = malloc (SIZE) in stead of decword[SIZE] (heap vs. stack) just moves the location where you overflow (stack or heap), it doesn't change the fact that you overflow. This one still overflows if strlen(prev_charset) > 1024 ret = conv_to_utf8 (prev_charset, decword, inlen, utf8_decword_ptr, outlen); This one still overflows if (inptr-start) > 1024 quoted_decode(start, inptr-start, decword_ptr); I don't know if there are encodings with more than two bytes per character, but if so this one overflows too (1024 vs. 2048). ret = conv_to_utf8 (curr_charset, decword, inlen, utf8_decword_ptr, outlen); In your conv_to_utf8 you do printf() and print_hex(), the end-user is really not thta interested in what happens at the console. Perhaps use g_warning or g_critical if it's a really severe problem in the code, but a problem in the data of the user is never a problem of the application. In my opinion must problems about the data the application shows (the model) be reported to the user in the UI layer. Not on the console. On Mon, 2007-12-24 at 10:07 +0300, Peter Volkov wrote: > > And there are even more users affected by this. I've already reported > similar problem in bug 315513. Thus this affects not only CJK people: > > http://bugzilla.gnome.org/show_bug.cgi?id=315513 Okay, but this doesn't mean buffer overflowable code must be immediately accepted :), I guess we don't want ILY E-mails for the Linux desktop soon, right? -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader
В Пнд, 24/12/2007 в 13:21 +0800, jacky пишет: > --- Jeff Stedfast <[EMAIL PROTECTED]>wrote: > There are two kind of email need to support: > 1) An encoded-word was divided into two line. This was > sent by dotProject v2.0.1 . And there are even more users affected by this. I've already reported similar problem in bug 315513. Thus this affects not only CJK people: http://bugzilla.gnome.org/show_bug.cgi?id=315513 -- Peter. signature.asc Description: Эта часть сообщения подписана цифровой подписью ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader
--- Jeff Stedfast <[EMAIL PROTECTED]>wrote: > Hi Jacky, > > I've looked over your patch, but unfortunately it is > unusable. The patch > is riddled with buffer overflows and incorrect > logic. > Yes, I use a fixed length string to store some value, it maybe overflow. I write another version by using heap insteads of stack. I think the stack version is simple and enough, so I send it only. Two version of rfc2047_decode_word() is in attachment. Can you explain the incorrect logic in my patch? > What types of bugs are you actually trying to fix? > What is it about CJK > messages in particular that are not getting decoded > properly? Your email > was overly vague. > Maybe I used the wrong word. I think I just enhance the CJK header support. The patch enhance three point: 1) You know, encoded-words must be separated by CRLF SPACE, but some email client do not do that. 2) A CJK character's encoded string must in an encoded-word, but some email client divide it into two encoded-words. 3) Some CJK character need to encode to GBK charset, but the charset name in encoded-word is GB2312. There are two kind of email need to support: 1) An encoded-word was divided into two line. This was sent by dotProject v2.0.1 . 2) Use GB2312 to encode CJK character directly. Some of them was supported by evolution, but some of them didn't. > Your changes to e-iconv can probably be taken if I > understand correctly > that GBK is a superset of gb2312 ( > http://en.wikipedia.org/wiki/GBK ), > altho it would have been nice to have gotten some > sort of link > explaining that with your original email (or via a > ChangeLog entry) :) > > Thanks, > > Jeff > > >>> jacky <[EMAIL PROTECTED]> 12/23/07 10:09 AM > >>> > Hi, all. > > The rfc2047 decoder in libcamel can not decode some > CJK header correctly. Although some of them are not > correspond to RFC, but I need to decode it correctly > and I thought if evolution can display there email > correctly more people like it. > > So I write a new rfc2047 decoder, and it's in the > patch. With the patch, libcamel can decode CJK > header > correctly and evolution can display CJK header > correctly now. I had test it in my mailbox. My > mailbox > has 2000 emails which were sent by evolution, > thunderbird, outlook, outlook express, foxmail, open > webmail, yahoo, gmail, lotus notes, etc. Without > this > patch, almost 20% of there emails can't be decoded > and > displayed correctly, with this patch, 99% of there > emails can be decoded and displayed correctly. > > And I found that the attachment with CJK name can't > be > recognised and displayed by outlook / outlook > express > / foxmail. This is because there email clients do > not > support RFC2184. Evolution always use RFC2184 encode > mothod to encode attachment name, so the email with > CJK named attachment can't display in outlook / > outlook express / foxmail. In thunderbird, you can > set > the option "mail.strictly_mime.parm_folding" to 0 or > 1 > for using RFC2047 encode mothod to encode attachment > name. Can we add a similar option? > > Best regards. > ___ 雅虎邮箱传递新年祝福,个性贺卡送亲朋! http://cn.mail.yahoo.com/gc/index.html?entry=5&souce=mail_mailletter_tagline/* decode rfc 2047 encoded string segment */ #define DECWORD_LEN 1024 #define UTF8_DECWORD_LEN 2048 #if 1 //USE_STACK static char * rfc2047_decode_word(const char *in, size_t len) { char prev_charset[32], curr_charset[32]; char encode; char *start, *inptr, *inend; char decword[DECWORD_LEN], utf8_decword[UTF8_DECWORD_LEN]; char *decword_ptr, *utf8_decword_ptr; size_t inlen, outlen, ret; prev_charset[0] = curr_charset[0] = '\0'; decword_ptr = decword; utf8_decword_ptr = utf8_decword; /* quick check to see if this could possibly be a real encoded word */ if (len < 8 || !(in[0] == '=' && in[1] == '?' && in[len-1] == '=' && in[len-2] == '?')) { return NULL; } inptr = in; inend = in + len; outlen = sizeof(utf8_decword); while (inptr < inend) { /* begin */ inptr = memchr (inptr, '?', inend-inptr); if (!inptr || *(inptr-1) != '=') { return NULL; } inptr++; /* charset */ start = inptr; inptr = memchr (inptr, '?', inend-inptr); if (!inptr) { return NULL; } strncpy (curr_charset, start, inptr-start); /* maybe overflow */ curr_charset[inptr-start] = '\0'; if (prev_charset[0] == '\0') { /* first charset in multi encode words */ strcpy (prev_charset, curr_charset); } d(printf ("curr_charset = %s\n", curr_charset)); /* if (charset.perv != charset.curr) iconv perv to utf8 */ if (prev_charset[0] != '\0' && strcmp(prev_charset, curr_charset)) { inlen = decword_ptr - decword; ret = conv_to_utf8 (prev_charset, decword, inlen, utf8_decword_ptr, outlen); if (ret == (size_t)-1) { printf ("conv_to_utf8() error!\n"); return NULL; } utf8_decword_ptr += ret; outlen = outlen - ret; decword_ptr = decword; /* reset decword_ptr */
Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader
On Sun, 2007-12-23 at 14:51 -0700, Jeff Stedfast wrote: > What types of bugs are you actually trying to fix? What is it about CJK > messages in particular that are not getting decoded properly? Your email > was overly vague. Looks like he wants to support both 'B' and 'b' and 'Q' and 'q' in stead of just 'B' and 'Q' for the first characters for Base64 or Quoted strings, for example. The rfc2047_decode_word implementation indeed has a few serious potential buffer overflows. For example the charset buffer will be overflowed if larger than 32 bytes. > >>> jacky <[EMAIL PROTECTED]> 12/23/07 10:09 AM >>> > Hi, all. > > The rfc2047 decoder in libcamel can not decode some > CJK header correctly. Although some of them are not > correspond to RFC, but I need to decode it correctly > and I thought if evolution can display there email > correctly more people like it. > > So I write a new rfc2047 decoder, and it's in the > patch. With the patch, libcamel can decode CJK header > correctly and evolution can display CJK header > correctly now. I had test it in my mailbox. My mailbox > has 2000 emails which were sent by evolution, > thunderbird, outlook, outlook express, foxmail, open > webmail, yahoo, gmail, lotus notes, etc. Without this > patch, almost 20% of there emails can't be decoded and > displayed correctly, with this patch, 99% of there > emails can be decoded and displayed correctly. > > And I found that the attachment with CJK name can't be > recognised and displayed by outlook / outlook express > / foxmail. This is because there email clients do not > support RFC2184. Evolution always use RFC2184 encode > mothod to encode attachment name, so the email with > CJK named attachment can't display in outlook / > outlook express / foxmail. In thunderbird, you can set > the option "mail.strictly_mime.parm_folding" to 0 or 1 > for using RFC2047 encode mothod to encode attachment > name. Can we add a similar option? > > Best regards. > > > ___ > 雅虎邮箱传递新年祝福,个性贺卡送亲朋! > http://cn.mail.yahoo.com/gc/index.html?entry=5&souce=mail_mailletter_taglin > ___ > Evolution-hackers mailing list > Evolution-hackers@gnome.org > http://mail.gnome.org/mailman/listinfo/evolution-hackers -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader
Hi Jacky, I've looked over your patch, but unfortunately it is unusable. The patch is riddled with buffer overflows and incorrect logic. What types of bugs are you actually trying to fix? What is it about CJK messages in particular that are not getting decoded properly? Your email was overly vague. Your changes to e-iconv can probably be taken if I understand correctly that GBK is a superset of gb2312 ( http://en.wikipedia.org/wiki/GBK ), altho it would have been nice to have gotten some sort of link explaining that with your original email (or via a ChangeLog entry) :) Thanks, Jeff >>> jacky <[EMAIL PROTECTED]> 12/23/07 10:09 AM >>> Hi, all. The rfc2047 decoder in libcamel can not decode some CJK header correctly. Although some of them are not correspond to RFC, but I need to decode it correctly and I thought if evolution can display there email correctly more people like it. So I write a new rfc2047 decoder, and it's in the patch. With the patch, libcamel can decode CJK header correctly and evolution can display CJK header correctly now. I had test it in my mailbox. My mailbox has 2000 emails which were sent by evolution, thunderbird, outlook, outlook express, foxmail, open webmail, yahoo, gmail, lotus notes, etc. Without this patch, almost 20% of there emails can't be decoded and displayed correctly, with this patch, 99% of there emails can be decoded and displayed correctly. And I found that the attachment with CJK name can't be recognised and displayed by outlook / outlook express / foxmail. This is because there email clients do not support RFC2184. Evolution always use RFC2184 encode mothod to encode attachment name, so the email with CJK named attachment can't display in outlook / outlook express / foxmail. In thunderbird, you can set the option "mail.strictly_mime.parm_folding" to 0 or 1 for using RFC2047 encode mothod to encode attachment name. Can we add a similar option? Best regards. ___ 雅虎邮箱传递新年祝福,个性贺卡送亲朋! http://cn.mail.yahoo.com/gc/index.html?entry=5&souce=mail_mailletter_taglin ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers