Bug: problem decoding some non-ascii characters in subjects

2013-02-10 Thread Jani Nikula
On Sun, 10 Feb 2013, Albin Stjerna  wrote:
> Hm. So I should report this to Thunderbird? I tried searching through
> their bug reports but didn't find anything.

I tried that too; there were other RFC 2047 related header bugs but
could not find this one. And the other bugs were old and fixed. Judging
by the User-Agent header this is fairly up-to-date Thunderbird.

It seems to be list mail. It would not surprise me that some ill-advised
mailing list manager would decode and re-encode the subject. One could
try sending the same message directly and through the list, and see if
there's a difference.

> I didn't think it was a bug, since Gmail rendered it just fine.

It's possible they interpret the RFC in a more relaxed way. AFAICS
notmuch relies on gmime to handle this, so I think we would have to go
out of our way to work around this in notmuch. A quick search did not
bring up anything gmime related about this, so I don't know if this has
been discussed in the gime context.


BR,
Jani.


Bug: problem decoding some non-ascii characters in subjects

2013-02-10 Thread Albin Stjerna
Jani Nikula wrote:

> It seems to be list mail. It would not surprise me that some ill-advised
> mailing list manager would decode and re-encode the subject. One could
> try sending the same message directly and through the list, and see if
> there's a difference.

Possibly, though I don't have the original message since I got it from a
list. It's actually from Biblist (a Swedish mailing list for librarians
and such), and it seems to run LISTSERV 15.5 (see
http://segate.sunet.se/cgi-bin/wa?A0=BIBLIST). According to itself, it's
?industry-standard?, which would indeed support your thesis.


Bug: problem decoding some non-ascii characters in subjects

2013-02-10 Thread Albin Stjerna
Jani Nikula wrote:

> Is that entirely on one line in the original message file? If not, where
> exactly is it split?

It's in one line.

> Either way, at a glance, it seems like the encoding is malformed. I
> think the encoded-word ("=?" charset "?" encoding "?" encoded-text "?=")
> should be separated by space to make it an atom. [RFC 2047, RFC 2822].

> If you manually move the leading 'f' after the "?Q?" bit, it works as
> expected. It looks like the bug is in the sender's user agent.

Hm. So I should report this to Thunderbird? I tried searching through
their bug reports but didn't find anything.

I didn't think it was a bug, since Gmail rendered it just fine.


Re: Bug: problem decoding some non-ascii characters in subjects

2013-02-10 Thread Albin Stjerna
Jani Nikula wrote:

 Is that entirely on one line in the original message file? If not, where
 exactly is it split?

It's in one line.

 Either way, at a glance, it seems like the encoding is malformed. I
 think the encoded-word (=? charset ? encoding ? encoded-text ?=)
 should be separated by space to make it an atom. [RFC 2047, RFC 2822].

 If you manually move the leading 'f' after the ?Q? bit, it works as
 expected. It looks like the bug is in the sender's user agent.

Hm. So I should report this to Thunderbird? I tried searching through
their bug reports but didn't find anything.

I didn't think it was a bug, since Gmail rendered it just fine.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Bug: problem decoding some non-ascii characters in subjects

2013-02-10 Thread Jani Nikula
On Sun, 10 Feb 2013, Albin Stjerna albin.stje...@gmail.com wrote:
 Hm. So I should report this to Thunderbird? I tried searching through
 their bug reports but didn't find anything.

I tried that too; there were other RFC 2047 related header bugs but
could not find this one. And the other bugs were old and fixed. Judging
by the User-Agent header this is fairly up-to-date Thunderbird.

It seems to be list mail. It would not surprise me that some ill-advised
mailing list manager would decode and re-encode the subject. One could
try sending the same message directly and through the list, and see if
there's a difference.

 I didn't think it was a bug, since Gmail rendered it just fine.

It's possible they interpret the RFC in a more relaxed way. AFAICS
notmuch relies on gmime to handle this, so I think we would have to go
out of our way to work around this in notmuch. A quick search did not
bring up anything gmime related about this, so I don't know if this has
been discussed in the gime context.


BR,
Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Bug: problem decoding some non-ascii characters in subjects

2013-02-10 Thread Albin Stjerna
Jani Nikula wrote:

 It seems to be list mail. It would not surprise me that some ill-advised
 mailing list manager would decode and re-encode the subject. One could
 try sending the same message directly and through the list, and see if
 there's a difference.

Possibly, though I don't have the original message since I got it from a
list. It's actually from Biblist (a Swedish mailing list for librarians
and such), and it seems to run LISTSERV 15.5 (see
http://segate.sunet.se/cgi-bin/wa?A0=BIBLIST). According to itself, it's
»industry-standard«, which would indeed support your thesis.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Bug: problem decoding some non-ascii characters in subjects

2013-02-09 Thread Jani Nikula
On Sat, 09 Feb 2013, Albin Stjerna  wrote:
> Jani Nikula wrote:
>
>> On Fri, 08 Feb 2013, Albin Stjerna  wrote:
>> > I've been noticing that notmuch has some problems decoding certain
>> > strangely-encoded non-ascii characters in certain emails. For example,
>> > today I got this: [BIBLIST] Digitaliseringensprojektens skadliga
>> > f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be
>> > rendered: ?Digitaliseringsprojektens skadliga f?rk?rlek f?r
>> > PDF-formatet?).
>> >
>> > Apparently, some metadata is passed on to help the MUA decode the
>> > string, but notmuch doesn't seem to handle it. Entire emails can of
>> > course be supplied as needed.
>
>> Please copy paste the Subject: header directly from the message file.
>
> The exact Subject: header (from the file, not notmuch) is:
> Subject: [BIBLIST] Digitaliseringensprojektens skadliga 
> f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet

Is that entirely on one line in the original message file? If not, where
exactly is it split?

Either way, at a glance, it seems like the encoding is malformed. I
think the encoded-word ("=?" charset "?" encoding "?" encoded-text "?=")
should be separated by space to make it an atom. [RFC 2047, RFC 2822].

If you manually move the leading 'f' after the "?Q?" bit, it works as
expected. It looks like the bug is in the sender's user agent.


BR,
Jani.


Bug: problem decoding some non-ascii characters in subjects

2013-02-09 Thread Albin Stjerna
Jani Nikula wrote:

> On Fri, 08 Feb 2013, Albin Stjerna  wrote:
> > I've been noticing that notmuch has some problems decoding certain
> > strangely-encoded non-ascii characters in certain emails. For example,
> > today I got this: [BIBLIST] Digitaliseringensprojektens skadliga
> > f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be
> > rendered: ?Digitaliseringsprojektens skadliga f?rk?rlek f?r
> > PDF-formatet?).
> >
> > Apparently, some metadata is passed on to help the MUA decode the
> > string, but notmuch doesn't seem to handle it. Entire emails can of
> > course be supplied as needed.

> Please copy paste the Subject: header directly from the message file.

The exact Subject: header (from the file, not notmuch) is:
Subject: [BIBLIST] Digitaliseringensprojektens skadliga 
f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet

Other potentially interesting headers are:
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106
Thunderbird/17.0.2
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

(formatted as they appeared in the mail file)


Re: Bug: problem decoding some non-ascii characters in subjects

2013-02-09 Thread Albin Stjerna
Jani Nikula wrote:

 On Fri, 08 Feb 2013, Albin Stjerna albin.stje...@gmail.com wrote:
  I've been noticing that notmuch has some problems decoding certain
  strangely-encoded non-ascii characters in certain emails. For example,
  today I got this: [BIBLIST] Digitaliseringensprojektens skadliga
  f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be
  rendered: »Digitaliseringsprojektens skadliga förkärlek för
  PDF-formatet«).
 
  Apparently, some metadata is passed on to help the MUA decode the
  string, but notmuch doesn't seem to handle it. Entire emails can of
  course be supplied as needed.

 Please copy paste the Subject: header directly from the message file.

The exact Subject: header (from the file, not notmuch) is:
Subject: [BIBLIST] Digitaliseringensprojektens skadliga 
f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet

Other potentially interesting headers are:
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106
Thunderbird/17.0.2
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

(formatted as they appeared in the mail file)
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Bug: problem decoding some non-ascii characters in subjects

2013-02-09 Thread Jani Nikula
On Sat, 09 Feb 2013, Albin Stjerna albin.stje...@gmail.com wrote:
 Jani Nikula wrote:

 On Fri, 08 Feb 2013, Albin Stjerna albin.stje...@gmail.com wrote:
  I've been noticing that notmuch has some problems decoding certain
  strangely-encoded non-ascii characters in certain emails. For example,
  today I got this: [BIBLIST] Digitaliseringensprojektens skadliga
  f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be
  rendered: »Digitaliseringsprojektens skadliga förkärlek för
  PDF-formatet«).
 
  Apparently, some metadata is passed on to help the MUA decode the
  string, but notmuch doesn't seem to handle it. Entire emails can of
  course be supplied as needed.

 Please copy paste the Subject: header directly from the message file.

 The exact Subject: header (from the file, not notmuch) is:
 Subject: [BIBLIST] Digitaliseringensprojektens skadliga 
 f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet

Is that entirely on one line in the original message file? If not, where
exactly is it split?

Either way, at a glance, it seems like the encoding is malformed. I
think the encoded-word (=? charset ? encoding ? encoded-text ?=)
should be separated by space to make it an atom. [RFC 2047, RFC 2822].

If you manually move the leading 'f' after the ?Q? bit, it works as
expected. It looks like the bug is in the sender's user agent.


BR,
Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Bug: problem decoding some non-ascii characters in subjects

2013-02-08 Thread Jani Nikula
On Fri, 08 Feb 2013, Albin Stjerna  wrote:
> I've been noticing that notmuch has some problems decoding certain
> strangely-encoded non-ascii characters in certain emails. For example,
> today I got this: [BIBLIST] Digitaliseringensprojektens skadliga
> f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be
> rendered: ?Digitaliseringsprojektens skadliga f?rk?rlek f?r
> PDF-formatet?).
>
> Apparently, some metadata is passed on to help the MUA decode the
> string, but notmuch doesn't seem to handle it. Entire emails can of
> course be supplied as needed.

Please copy paste the Subject: header directly from the message file.


Bug: problem decoding some non-ascii characters in subjects

2013-02-08 Thread Albin Stjerna
Hi,

I've been noticing that notmuch has some problems decoding certain 
strangely-encoded non-ascii characters in certain emails. For example, today I 
got this: [BIBLIST] Digitaliseringensprojektens skadliga 
f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be rendered: 
?Digitaliseringsprojektens skadliga f?rk?rlek f?r PDF-formatet?).

Apparently, some metadata is passed on to help the MUA decode the string, but 
notmuch doesn't seem to handle it. Entire emails can of course be supplied as 
needed.


Bug: problem decoding some non-ascii characters in subjects

2013-02-08 Thread Albin Stjerna
Hi,

I've been noticing that notmuch has some problems decoding certain 
strangely-encoded non-ascii characters in certain emails. For example, today I 
got this: [BIBLIST] Digitaliseringensprojektens skadliga 
f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be rendered: 
»Digitaliseringsprojektens skadliga förkärlek för PDF-formatet«).

Apparently, some metadata is passed on to help the MUA decode the string, but 
notmuch doesn't seem to handle it. Entire emails can of course be supplied as 
needed.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Bug: problem decoding some non-ascii characters in subjects

2013-02-08 Thread Jani Nikula
On Fri, 08 Feb 2013, Albin Stjerna albin.stje...@gmail.com wrote:
 I've been noticing that notmuch has some problems decoding certain
 strangely-encoded non-ascii characters in certain emails. For example,
 today I got this: [BIBLIST] Digitaliseringensprojektens skadliga
 f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be
 rendered: »Digitaliseringsprojektens skadliga förkärlek för
 PDF-formatet«).

 Apparently, some metadata is passed on to help the MUA decode the
 string, but notmuch doesn't seem to handle it. Entire emails can of
 course be supplied as needed.

Please copy paste the Subject: header directly from the message file.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch