Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader

2007-12-25 Thread Jeffrey Stedfast

On Tue, 2007-12-25 at 15:56 +0800, jacky wrote:
> But the problem describe below has not been solved.
> > 1) An encoded-word was divided into two line. This
> was
> > sent by dotProject v2.0.1 .
> 
> As I seen this kind of email use quoted encode only,
> and header_decode_text() can get all encoded-words
> which is separated by SPACE, a simple solution is
> replace SPACE with '_'. In fact OpenWebmail do like
> this. 
> But the problem is I must change the prototype of
> header_decode_text() to 
> char *header_decode_text (char *in, size_t inlen, int
> ctext, const char *default_charset)
> Originality, it is
> char *header_decode_text (const char *in, size_t
> inlen, int ctext, const char *default_charset)
> Functions which call header_decode_text() must been
> changed too.
> Does anyone have better proposal?

It would be a bad idea to have header_decode_text() modify the original
string.

I highly suggest starting with GMime's g_mime_utils_header_decode_text()
implementation because it handles all the problems you've described.

Jeff


___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader

2007-12-25 Thread jacky

--- jacky <[EMAIL PROTECTED]>worte:

> 
> --- Peter Volkov <[EMAIL PROTECTED]>wrote:
> 
> > 
> > В Пнд, 24/12/2007 в 13:21 +0800, jacky
> > пишет:
> > > --- Jeff Stedfast <[EMAIL PROTECTED]>wrote:
> > > There are two kind of email need to support:
> > > 1) An encoded-word was divided into two line.
> This
> > was
> > > sent by dotProject v2.0.1 .
> > 
> > And there are even more users affected by this.
> I've
> > already reported
> > similar problem in bug 315513. Thus this affects
> not
> > only CJK people:
> > 
> > http://bugzilla.gnome.org/show_bug.cgi?id=315513
> > 
> 
> In fact, the parser and decoder in my patch support
> this encoded-words. I already mentioned in my email:
> > 2) A CJK character's encoded string must in an
> > encoded-word, but some email client divide it into
> two
> > encoded-words.
> 
> But the problem describe below has not been solved.
> > 1) An encoded-word was divided into two line. This
> was
> > sent by dotProject v2.0.1 .
> 
> As I seen this kind of email use quoted encode only,
> and header_decode_text() can get all encoded-words
> which is separated by SPACE, a simple solution is

CORRECTION:
header_decode_text() can get whole encoded-word
which is separated by SPACE
Sorry for my poor English. 
An example:
=?utf-8?Q?=E7=B3=BB=E7=BB=9F=E7=A0=94=E7=A9=B6
=E5=B7=B2=E6=9B=B4=E6=96=B0?=

> replace SPACE with '_'. In fact OpenWebmail do like
> this. 
> But the problem is I must change the prototype of
> header_decode_text() to 
> char *header_decode_text (char *in, size_t inlen,
> int
> ctext, const char *default_charset)
> Originality, it is
> char *header_decode_text (const char *in, size_t
> inlen, int ctext, const char *default_charset)
> Functions which call header_decode_text() must been
> changed too.
> Does anyone have better proposal?
> 
> > -- 
> > Peter.
> > 
> 


  ___ 
雅虎邮箱传递新年祝福,个性贺卡送亲朋! 
http://cn.mail.yahoo.com/gc/index.html?entry=5&souce=mail_mailletter_tagline
___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader

2007-12-24 Thread jacky

--- Peter Volkov <[EMAIL PROTECTED]>wrote:

> 
> В Пнд, 24/12/2007 в 13:21 +0800, jacky
> пишет:
> > --- Jeff Stedfast <[EMAIL PROTECTED]>wrote:
> > There are two kind of email need to support:
> > 1) An encoded-word was divided into two line. This
> was
> > sent by dotProject v2.0.1 .
> 
> And there are even more users affected by this. I've
> already reported
> similar problem in bug 315513. Thus this affects not
> only CJK people:
> 
> http://bugzilla.gnome.org/show_bug.cgi?id=315513
> 

In fact, the parser and decoder in my patch support
this encoded-words. I already mentioned in my email:
> 2) A CJK character's encoded string must in an
> encoded-word, but some email client divide it into
two
> encoded-words.

But the problem describe below has not been solved.
> 1) An encoded-word was divided into two line. This
was
> sent by dotProject v2.0.1 .

As I seen this kind of email use quoted encode only,
and header_decode_text() can get all encoded-words
which is separated by SPACE, a simple solution is
replace SPACE with '_'. In fact OpenWebmail do like
this. 
But the problem is I must change the prototype of
header_decode_text() to 
char *header_decode_text (char *in, size_t inlen, int
ctext, const char *default_charset)
Originality, it is
char *header_decode_text (const char *in, size_t
inlen, int ctext, const char *default_charset)
Functions which call header_decode_text() must been
changed too.
Does anyone have better proposal?

> -- 
> Peter.
> 



  ___ 
雅虎邮箱传递新年祝福,个性贺卡送亲朋! 
http://cn.mail.yahoo.com/gc/index.html?entry=5&souce=mail_mailletter_tagline
___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader

2007-12-24 Thread Philip Van Hoof

On Mon, 2007-12-24 at 13:21 +0800, jacky wrote:

> Yes, I use a fixed length string to store some value, it maybe 
> overflow. I write another version by using heap insteads of 
> stack. I think the stack version is simple and enough, so I 
> send it only. Two version of rfc2047_decode_word() is in 
> attachment. Can you explain the incorrect logic in my patch?


Just doing decword = malloc (SIZE) in stead of decword[SIZE] (heap vs.
stack) just moves the location where you overflow (stack or heap), it
doesn't change the fact that you overflow.

This one still overflows if strlen(prev_charset) > 1024

ret = conv_to_utf8 (prev_charset, decword, inlen, utf8_decword_ptr, outlen);


This one still overflows if (inptr-start) > 1024

quoted_decode(start, inptr-start, decword_ptr);

I don't know if there are encodings with more than two bytes per
character, but if so this one overflows too (1024 vs. 2048).

ret = conv_to_utf8 (curr_charset, decword, inlen, utf8_decword_ptr, outlen);

In your conv_to_utf8 you do printf() and print_hex(), the end-user is
really not thta interested in what happens at the console. Perhaps use
g_warning or g_critical if it's a really severe problem in the code, but
a problem in the data of the user is never a problem of the application.
In my opinion must problems about the data the application shows (the
model) be reported to the user in the UI layer. Not on the console.


On Mon, 2007-12-24 at 10:07 +0300, Peter Volkov wrote:
> 
> And there are even more users affected by this. I've already reported
> similar problem in bug 315513. Thus this affects not only CJK people:
> 
> http://bugzilla.gnome.org/show_bug.cgi?id=315513


Okay, but this doesn't mean buffer overflowable code must be immediately
accepted :), I guess we don't want ILY E-mails for the Linux desktop
soon, right?



-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader

2007-12-23 Thread Peter Volkov

В Пнд, 24/12/2007 в 13:21 +0800, jacky пишет:
> --- Jeff Stedfast <[EMAIL PROTECTED]>wrote:
> There are two kind of email need to support:
> 1) An encoded-word was divided into two line. This was
> sent by dotProject v2.0.1 .

And there are even more users affected by this. I've already reported
similar problem in bug 315513. Thus this affects not only CJK people:

http://bugzilla.gnome.org/show_bug.cgi?id=315513

-- 
Peter.


signature.asc
Description: Эта	 часть	 сообщения	 подписана	 цифровой	 подписью
___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader

2007-12-23 Thread jacky

--- Jeff Stedfast <[EMAIL PROTECTED]>wrote:

> Hi Jacky,
> 
> I've looked over your patch, but unfortunately it is
> unusable. The patch
> is riddled with buffer overflows and incorrect
> logic.
> 

Yes, I use a fixed length string to store some value,
it maybe overflow. I write another version by using
heap insteads of stack. I think the stack version is
simple and enough, so I  send it only. Two version of
rfc2047_decode_word() is in attachment.
Can you explain the incorrect logic in my patch?

> What types of bugs are you actually trying to fix?
> What is it about CJK
> messages in particular that are not getting decoded
> properly? Your email
> was overly vague.
> 

Maybe I used the wrong word. I think I just enhance
the CJK header support. The patch enhance three point:
1) You know, encoded-words must be separated by CRLF
SPACE, but some email client do not do that.
2) A CJK character's encoded string must in an
encoded-word, but some email client divide it into two
encoded-words.
3) Some CJK character need to encode to GBK charset,
but the charset name in encoded-word is GB2312.

There are two kind of email need to support:
1) An encoded-word was divided into two line. This was
sent by dotProject v2.0.1 .
2) Use GB2312 to encode CJK character directly. Some
of them was supported by evolution, but some of them
didn't.

> Your changes to e-iconv can probably be taken if I
> understand correctly
> that GBK is a superset of gb2312 (
> http://en.wikipedia.org/wiki/GBK ),
> altho it would have been nice to have gotten some
> sort of link
> explaining that with your original email (or via a
> ChangeLog entry) :)
> 
> Thanks,
> 
> Jeff
> 
> >>> jacky <[EMAIL PROTECTED]> 12/23/07 10:09 AM
> >>>
> Hi, all.
> 
> The rfc2047 decoder in libcamel can not decode some
> CJK header correctly. Although some of them are not
> correspond to RFC, but I need to decode it correctly
> and I thought if evolution can display there email
> correctly more people like it.
> 
> So I write a new rfc2047 decoder, and it's in the
> patch. With the patch, libcamel can decode CJK
> header
> correctly and evolution can display CJK header
> correctly now. I had test it in my mailbox. My
> mailbox
> has 2000 emails which were sent by evolution,
> thunderbird, outlook, outlook express, foxmail, open
> webmail, yahoo, gmail, lotus notes, etc. Without
> this
> patch, almost 20% of there emails can't be decoded
> and
> displayed correctly, with this patch, 99% of there
> emails can be decoded and displayed correctly.
> 
> And I found that the attachment with CJK name can't
> be
> recognised and displayed by outlook / outlook
> express
> / foxmail. This is because there email clients do
> not
> support RFC2184. Evolution always use RFC2184 encode
> mothod to encode attachment name, so the email with
> CJK named attachment can't display in outlook /
> outlook express / foxmail. In thunderbird, you can
> set
> the option "mail.strictly_mime.parm_folding" to 0 or
> 1
> for using RFC2047 encode mothod to encode attachment
> name. Can we add a similar option?
> 
> Best regards.
> 



  ___ 
雅虎邮箱传递新年祝福,个性贺卡送亲朋! 
http://cn.mail.yahoo.com/gc/index.html?entry=5&souce=mail_mailletter_tagline/* decode rfc 2047 encoded string segment */
#define DECWORD_LEN 1024
#define UTF8_DECWORD_LEN 2048

#if 1 //USE_STACK
static char *
rfc2047_decode_word(const char *in, size_t len)
{
	char prev_charset[32], curr_charset[32];
	char encode;
	char *start, *inptr, *inend;
	char decword[DECWORD_LEN], utf8_decword[UTF8_DECWORD_LEN];
	char *decword_ptr, *utf8_decword_ptr;
	size_t inlen, outlen, ret;

	prev_charset[0] = curr_charset[0] = '\0';

	decword_ptr = decword;
	utf8_decword_ptr = utf8_decword;

	/* quick check to see if this could possibly be a real encoded word */
	if (len < 8
	|| !(in[0] == '=' && in[1] == '?'
		 && in[len-1] == '=' && in[len-2] == '?')) {
		return NULL;
	}

	inptr = in;
	inend = in + len;
	outlen = sizeof(utf8_decword);

	while (inptr < inend) {
		/* begin */
		inptr = memchr (inptr, '?', inend-inptr);
		if (!inptr || *(inptr-1) != '=') {
			return NULL;
		}
		inptr++;

		/* charset */
		start = inptr;
		inptr = memchr (inptr, '?', inend-inptr);
		if (!inptr) {
			return NULL;
		}
		strncpy (curr_charset, start, inptr-start); /* maybe overflow */
		curr_charset[inptr-start] = '\0';
		if (prev_charset[0] == '\0') { /* first charset in multi encode words */
			strcpy (prev_charset, curr_charset);
		}
		d(printf ("curr_charset = %s\n", curr_charset));

		/* if (charset.perv != charset.curr) iconv perv to utf8 */
		if (prev_charset[0] != '\0' && strcmp(prev_charset, curr_charset)) {
			inlen = decword_ptr - decword;
			ret = conv_to_utf8 (prev_charset, decword, inlen, utf8_decword_ptr, outlen);
			if (ret == (size_t)-1) {
printf ("conv_to_utf8() error!\n");
return NULL;
			}

			utf8_decword_ptr += ret;
			outlen = outlen - ret;

			decword_ptr = decword; /* reset decword_ptr */

Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader

2007-12-23 Thread Philip Van Hoof

On Sun, 2007-12-23 at 14:51 -0700, Jeff Stedfast wrote:

> What types of bugs are you actually trying to fix? What is it about CJK
> messages in particular that are not getting decoded properly? Your email
> was overly vague.

Looks like he wants to support both 'B' and 'b' and 'Q' and 'q' in stead
of just 'B' and 'Q' for the first characters for Base64 or Quoted
strings, for example. 

The rfc2047_decode_word implementation indeed has a few serious
potential buffer overflows. For example the charset buffer will be
overflowed if larger than 32 bytes.



> >>> jacky <[EMAIL PROTECTED]> 12/23/07 10:09 AM >>>
> Hi, all.
> 
> The rfc2047 decoder in libcamel can not decode some
> CJK header correctly. Although some of them are not
> correspond to RFC, but I need to decode it correctly
> and I thought if evolution can display there email
> correctly more people like it.
> 
> So I write a new rfc2047 decoder, and it's in the
> patch. With the patch, libcamel can decode CJK header
> correctly and evolution can display CJK header
> correctly now. I had test it in my mailbox. My mailbox
> has 2000 emails which were sent by evolution,
> thunderbird, outlook, outlook express, foxmail, open
> webmail, yahoo, gmail, lotus notes, etc. Without this
> patch, almost 20% of there emails can't be decoded and
> displayed correctly, with this patch, 99% of there
> emails can be decoded and displayed correctly.
> 
> And I found that the attachment with CJK name can't be
> recognised and displayed by outlook / outlook express
> / foxmail. This is because there email clients do not
> support RFC2184. Evolution always use RFC2184 encode
> mothod to encode attachment name, so the email with
> CJK named attachment can't display in outlook /
> outlook express / foxmail. In thunderbird, you can set
> the option "mail.strictly_mime.parm_folding" to 0 or 1
> for using RFC2047 encode mothod to encode attachment
> name. Can we add a similar option?
> 
> Best regards.
> 
> 
>   ___ 
> 雅虎邮箱传递新年祝福,个性贺卡送亲朋! 
> http://cn.mail.yahoo.com/gc/index.html?entry=5&souce=mail_mailletter_taglin
> ___
> Evolution-hackers mailing list
> Evolution-hackers@gnome.org
> http://mail.gnome.org/mailman/listinfo/evolution-hackers
-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader

2007-12-23 Thread Jeff Stedfast
Hi Jacky,

I've looked over your patch, but unfortunately it is unusable. The patch
is riddled with buffer overflows and incorrect logic.

What types of bugs are you actually trying to fix? What is it about CJK
messages in particular that are not getting decoded properly? Your email
was overly vague.

Your changes to e-iconv can probably be taken if I understand correctly
that GBK is a superset of gb2312 ( http://en.wikipedia.org/wiki/GBK ),
altho it would have been nice to have gotten some sort of link
explaining that with your original email (or via a ChangeLog entry) :)

Thanks,

Jeff

>>> jacky <[EMAIL PROTECTED]> 12/23/07 10:09 AM >>>
Hi, all.

The rfc2047 decoder in libcamel can not decode some
CJK header correctly. Although some of them are not
correspond to RFC, but I need to decode it correctly
and I thought if evolution can display there email
correctly more people like it.

So I write a new rfc2047 decoder, and it's in the
patch. With the patch, libcamel can decode CJK header
correctly and evolution can display CJK header
correctly now. I had test it in my mailbox. My mailbox
has 2000 emails which were sent by evolution,
thunderbird, outlook, outlook express, foxmail, open
webmail, yahoo, gmail, lotus notes, etc. Without this
patch, almost 20% of there emails can't be decoded and
displayed correctly, with this patch, 99% of there
emails can be decoded and displayed correctly.

And I found that the attachment with CJK name can't be
recognised and displayed by outlook / outlook express
/ foxmail. This is because there email clients do not
support RFC2184. Evolution always use RFC2184 encode
mothod to encode attachment name, so the email with
CJK named attachment can't display in outlook /
outlook express / foxmail. In thunderbird, you can set
the option "mail.strictly_mime.parm_folding" to 0 or 1
for using RFC2047 encode mothod to encode attachment
name. Can we add a similar option?

Best regards.


  ___ 
雅虎邮箱传递新年祝福,个性贺卡送亲朋! 
http://cn.mail.yahoo.com/gc/index.html?entry=5&souce=mail_mailletter_taglin
___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers