How to debug 'ignoring non-mail file' issues
On 2014-09-01 09:41:06, Perttu Luukko wrote: > Yes, that indeed works. I'll probably move these ignored files to a > separate folder for inspection. I looked at the mails that are still ignored after upgrading GMime to latest version, and I think I have found what they have in common. All of my ignored emails are from 2010-2011, and for some reason these mails contain a line like this: >From username Wed Sep 28 16:43:49 2011 somewhere among the headers. Note the '>' at the beginning of the line. The mails that are still ignored after upgrading GMime are those where this line happens to be the first line. Also, all of them have attachments for some reason. That line certainly doesn't look right, and I don't know where it came from. It might be some byproduct of mail redirection, since it shows my username, but the mails are not sent by me. I moved these problematic lines to the second line of each message, and now they are imported without problems. I probably won't file a bug for GMime because I have no idea whether this is just some oddity caused by my mail setup. Let this information reside here in case someone else has a similar problem. -- Perttu
How to debug 'ignoring non-mail file' issues
On 2014-09-03 19:03:40, Jani Nikula wrote: > On Wed, 03 Sep 2014, Perttu Luukko wrote: > > What I mean that there would be a separate error for cases "Does not > > resemble an email message at all", i.e., some control file your mail > > server happens to store in the mailbox, and "Looks like mail but we > > can't parse it", i.e., better find out why it can't be parsed to avoid > > potentially important messages going missing from the database. > > As I said, GMime does not tell us the difference between the two. There could be a separate parsing step that reads the first kilobyte or so and checks whether it is text, and whether there is a line starting with "From: " and possibly other headers. This could be run if GMime thinks the file is not mail so there would be negligible overhead. This is just a suggestion. Notmuch users are probably quite experienced so they can always investigate on their own why their emails are being ignored. But there could be more warning about ignored messages. Something like, at the end of each 'notmuch new' output: "Note: some files were ignored as non-mail. Check the list at ~/mail/.notmuch/ignored-files and adjust your ~/.notmuch-config". -- Perttu
Re: How to debug 'ignoring non-mail file' issues
On 2014-09-01 09:41:06, Perttu Luukko wrote: > Yes, that indeed works. I'll probably move these ignored files to a > separate folder for inspection. I looked at the mails that are still ignored after upgrading GMime to latest version, and I think I have found what they have in common. All of my ignored emails are from 2010-2011, and for some reason these mails contain a line like this: >From username Wed Sep 28 16:43:49 2011 somewhere among the headers. Note the '>' at the beginning of the line. The mails that are still ignored after upgrading GMime are those where this line happens to be the first line. Also, all of them have attachments for some reason. That line certainly doesn't look right, and I don't know where it came from. It might be some byproduct of mail redirection, since it shows my username, but the mails are not sent by me. I moved these problematic lines to the second line of each message, and now they are imported without problems. I probably won't file a bug for GMime because I have no idea whether this is just some oddity caused by my mail setup. Let this information reside here in case someone else has a similar problem. -- Perttu ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: How to debug 'ignoring non-mail file' issues
On 2014-09-03 19:03:40, Jani Nikula wrote: > On Wed, 03 Sep 2014, Perttu Luukko wrote: > > What I mean that there would be a separate error for cases "Does not > > resemble an email message at all", i.e., some control file your mail > > server happens to store in the mailbox, and "Looks like mail but we > > can't parse it", i.e., better find out why it can't be parsed to avoid > > potentially important messages going missing from the database. > > As I said, GMime does not tell us the difference between the two. There could be a separate parsing step that reads the first kilobyte or so and checks whether it is text, and whether there is a line starting with "From: " and possibly other headers. This could be run if GMime thinks the file is not mail so there would be negligible overhead. This is just a suggestion. Notmuch users are probably quite experienced so they can always investigate on their own why their emails are being ignored. But there could be more warning about ignored messages. Something like, at the end of each 'notmuch new' output: "Note: some files were ignored as non-mail. Check the list at ~/mail/.notmuch/ignored-files and adjust your ~/.notmuch-config". -- Perttu ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
How to debug 'ignoring non-mail file' issues
On Wed, 03 Sep 2014, Perttu Luukko wrote: > What I mean that there would be a separate error for cases "Does not > resemble an email message at all", i.e., some control file your mail > server happens to store in the mailbox, and "Looks like mail but we > can't parse it", i.e., better find out why it can't be parsed to avoid > potentially important messages going missing from the database. As I said, GMime does not tell us the difference between the two. BR, Jani.
Re: How to debug 'ignoring non-mail file' issues
On Wed, 03 Sep 2014, Perttu Luukko wrote: > What I mean that there would be a separate error for cases "Does not > resemble an email message at all", i.e., some control file your mail > server happens to store in the mailbox, and "Looks like mail but we > can't parse it", i.e., better find out why it can't be parsed to avoid > potentially important messages going missing from the database. As I said, GMime does not tell us the difference between the two. BR, Jani. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
How to debug 'ignoring non-mail file' issues
On 2014-09-02 23:37:12, Jani Nikula wrote: > On Mon, 01 Sep 2014, Perttu Luukko wrote: > > Yes, upgrading to GMime 2.6.20 caused all the messages on my server > > classified as mail. > > What was the old version? If it was 2.4 we should probably consider > dropping support for that in future notmuch. It was 2.4.33. It might still work for other people, I don't know. I still have some ignored mails. If I can nail down why they are ignored we might now more about why GMime 2.4 ignored even more mail. They were from around the same time period, so it might have something to do with the email setup I had at that time. > > Even more reason to give a separate warning for GMime parse errors. > > Not sure. We only get a binary success/fail from GMime, and that gets > printed for all non-email files. I'm not sure it's helpful. What I mean that there would be a separate error for cases "Does not resemble an email message at all", i.e., some control file your mail server happens to store in the mailbox, and "Looks like mail but we can't parse it", i.e., better find out why it can't be parsed to avoid potentially important messages going missing from the database. -- Perttu
How to debug 'ignoring non-mail file' issues
On Mon, 01 Sep 2014, Perttu Luukko wrote: > Yes, upgrading to GMime 2.6.20 caused all the messages on my server > classified as mail. What was the old version? If it was 2.4 we should probably consider dropping support for that in future notmuch. > Even more reason to give a separate warning for GMime parse errors. Not sure. We only get a binary success/fail from GMime, and that gets printed for all non-email files. I'm not sure it's helpful. BR, Jani.
Re: How to debug 'ignoring non-mail file' issues
On 2014-09-02 23:37:12, Jani Nikula wrote: > On Mon, 01 Sep 2014, Perttu Luukko wrote: > > Yes, upgrading to GMime 2.6.20 caused all the messages on my server > > classified as mail. > > What was the old version? If it was 2.4 we should probably consider > dropping support for that in future notmuch. It was 2.4.33. It might still work for other people, I don't know. I still have some ignored mails. If I can nail down why they are ignored we might now more about why GMime 2.4 ignored even more mail. They were from around the same time period, so it might have something to do with the email setup I had at that time. > > Even more reason to give a separate warning for GMime parse errors. > > Not sure. We only get a binary success/fail from GMime, and that gets > printed for all non-email files. I'm not sure it's helpful. What I mean that there would be a separate error for cases "Does not resemble an email message at all", i.e., some control file your mail server happens to store in the mailbox, and "Looks like mail but we can't parse it", i.e., better find out why it can't be parsed to avoid potentially important messages going missing from the database. -- Perttu ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: How to debug 'ignoring non-mail file' issues
On Mon, 01 Sep 2014, Perttu Luukko wrote: > Yes, upgrading to GMime 2.6.20 caused all the messages on my server > classified as mail. What was the old version? If it was 2.4 we should probably consider dropping support for that in future notmuch. > Even more reason to give a separate warning for GMime parse errors. Not sure. We only get a binary success/fail from GMime, and that gets printed for all non-email files. I'm not sure it's helpful. BR, Jani. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
How to debug 'ignoring non-mail file' issues
On Mon, Sep 01 2014, Perttu Luukko wrote: > On 2014-08-31 07:41:42, David Bremner wrote: >> Perttu Luukko writes: >> > The vast majority of these ignored mails are not ignored after I >> > transfer them with offlineimap to another computer. I can non-ignore >> > these files probably by copying the renamed file back to the mail >> > server, so this is fixable. Offlineimap shouldn't mess with the file's >> > contents, so is there something that can cause notmuch to ignore a file >> > based on its name? >> >> The most likely cause is that the files are mboxes, whether intentional >> or not. In particular if they start with a "From " (note the lack of :) >> and contain a second "From " at the beginning of a line later in the >> file. In this case something like sed can replace the initial >> "From " with "X-Envelope-From: ". >> >> I agree that the error message could be more informative in this case. > > No, the mails do contain "From: " with the appropriate colon. If I > understood correctly notmuch returns the same "not mail" return code The question here is whether the very first line of the mail file begins with 'From ', not whether *any* of the actual header line starts with 'From: ' IIRC the mails get accepted even the 'From:' header were missing... > both when the essential headers are missing (so the file probably really > isn't mail) and when GMime fails to parse the message. I think it would > be a good idea to give a different warning in the latter case. Sure... :D > > If the files really are ignored because of GMime it also explains why so > much more files are ignored on my mail provider's server than on my > laptop. The server probably has an older version of GMime. I'll upgrade > and see if that makes a difference. > > -- > Perttu Tomi
How to debug 'ignoring non-mail file' issues
On 2014-09-01 09:52:20, Perttu Luukko wrote: > If the files really are ignored because of GMime it also explains why so > much more files are ignored on my mail provider's server than on my > laptop. The server probably has an older version of GMime. I'll upgrade > and see if that makes a difference. Yes, upgrading to GMime 2.6.20 caused all the messages on my server classified as mail. Even more reason to give a separate warning for GMime parse errors. I'll see if my archive of older emails still contains some ignored files. -- Perttu
How to debug 'ignoring non-mail file' issues
On 2014-08-31 07:41:42, David Bremner wrote: > Perttu Luukko writes: > > The vast majority of these ignored mails are not ignored after I > > transfer them with offlineimap to another computer. I can non-ignore > > these files probably by copying the renamed file back to the mail > > server, so this is fixable. Offlineimap shouldn't mess with the file's > > contents, so is there something that can cause notmuch to ignore a file > > based on its name? > > The most likely cause is that the files are mboxes, whether intentional > or not. In particular if they start with a "From " (note the lack of :) > and contain a second "From " at the beginning of a line later in the > file. In this case something like sed can replace the initial > "From " with "X-Envelope-From: ". > > I agree that the error message could be more informative in this case. No, the mails do contain "From: " with the appropriate colon. If I understood correctly notmuch returns the same "not mail" return code both when the essential headers are missing (so the file probably really isn't mail) and when GMime fails to parse the message. I think it would be a good idea to give a different warning in the latter case. If the files really are ignored because of GMime it also explains why so much more files are ignored on my mail provider's server than on my laptop. The server probably has an older version of GMime. I'll upgrade and see if that makes a difference. -- Perttu
How to debug 'ignoring non-mail file' issues
On 2014-08-31 09:46:12, David Bremner wrote: > Perttu Luukko writes: > > > I understand that the list of non-mail files is stored in the > > notmuch database and the files are completely ignored from there on. > > This actually makes it harder to debug these kind of issues since > > the list of ignored mails is only visible on the first invocation of > > 'notmuch new', unless the files are moved around. Is there some way > > to extract the list of ignored files from the database for > > inspection? Maybe 'notmuch new' could have some kind of > > --unignore-non-mail switch that would reconsider previously ignored > > files. > > I _think_ it should suffice to do something like > >find Maildir -type d -exec touch {} \; > > to force a rescan Yes, that indeed works. I'll probably move these ignored files to a separate folder for inspection. -- Perttu
Re: How to debug 'ignoring non-mail file' issues
On Mon, Sep 01 2014, Perttu Luukko wrote: > On 2014-08-31 07:41:42, David Bremner wrote: >> Perttu Luukko writes: >> > The vast majority of these ignored mails are not ignored after I >> > transfer them with offlineimap to another computer. I can non-ignore >> > these files probably by copying the renamed file back to the mail >> > server, so this is fixable. Offlineimap shouldn't mess with the file's >> > contents, so is there something that can cause notmuch to ignore a file >> > based on its name? >> >> The most likely cause is that the files are mboxes, whether intentional >> or not. In particular if they start with a "From " (note the lack of :) >> and contain a second "From " at the beginning of a line later in the >> file. In this case something like sed can replace the initial >> "From " with "X-Envelope-From: ". >> >> I agree that the error message could be more informative in this case. > > No, the mails do contain "From: " with the appropriate colon. If I > understood correctly notmuch returns the same "not mail" return code The question here is whether the very first line of the mail file begins with 'From ', not whether *any* of the actual header line starts with 'From: ' IIRC the mails get accepted even the 'From:' header were missing... > both when the essential headers are missing (so the file probably really > isn't mail) and when GMime fails to parse the message. I think it would > be a good idea to give a different warning in the latter case. Sure... :D > > If the files really are ignored because of GMime it also explains why so > much more files are ignored on my mail provider's server than on my > laptop. The server probably has an older version of GMime. I'll upgrade > and see if that makes a difference. > > -- > Perttu Tomi ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: How to debug 'ignoring non-mail file' issues
On 2014-09-01 09:52:20, Perttu Luukko wrote: > If the files really are ignored because of GMime it also explains why so > much more files are ignored on my mail provider's server than on my > laptop. The server probably has an older version of GMime. I'll upgrade > and see if that makes a difference. Yes, upgrading to GMime 2.6.20 caused all the messages on my server classified as mail. Even more reason to give a separate warning for GMime parse errors. I'll see if my archive of older emails still contains some ignored files. -- Perttu ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: How to debug 'ignoring non-mail file' issues
On 2014-08-31 07:41:42, David Bremner wrote: > Perttu Luukko writes: > > The vast majority of these ignored mails are not ignored after I > > transfer them with offlineimap to another computer. I can non-ignore > > these files probably by copying the renamed file back to the mail > > server, so this is fixable. Offlineimap shouldn't mess with the file's > > contents, so is there something that can cause notmuch to ignore a file > > based on its name? > > The most likely cause is that the files are mboxes, whether intentional > or not. In particular if they start with a "From " (note the lack of :) > and contain a second "From " at the beginning of a line later in the > file. In this case something like sed can replace the initial > "From " with "X-Envelope-From: ". > > I agree that the error message could be more informative in this case. No, the mails do contain "From: " with the appropriate colon. If I understood correctly notmuch returns the same "not mail" return code both when the essential headers are missing (so the file probably really isn't mail) and when GMime fails to parse the message. I think it would be a good idea to give a different warning in the latter case. If the files really are ignored because of GMime it also explains why so much more files are ignored on my mail provider's server than on my laptop. The server probably has an older version of GMime. I'll upgrade and see if that makes a difference. -- Perttu ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: How to debug 'ignoring non-mail file' issues
On 2014-08-31 09:46:12, David Bremner wrote: > Perttu Luukko writes: > > > I understand that the list of non-mail files is stored in the > > notmuch database and the files are completely ignored from there on. > > This actually makes it harder to debug these kind of issues since > > the list of ignored mails is only visible on the first invocation of > > 'notmuch new', unless the files are moved around. Is there some way > > to extract the list of ignored files from the database for > > inspection? Maybe 'notmuch new' could have some kind of > > --unignore-non-mail switch that would reconsider previously ignored > > files. > > I _think_ it should suffice to do something like > >find Maildir -type d -exec touch {} \; > > to force a rescan Yes, that indeed works. I'll probably move these ignored files to a separate folder for inspection. -- Perttu ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
How to debug 'ignoring non-mail file' issues
Hi, I indexed my archive of emails from recent years with notmuch (about 10k messages so not much). I have quite a lot of messages 'notmuch new' ignores as non-mail files, about 1000 of them. They are not obviously malformed, meaning that the files certainly look like emails when opened in a text editor. I'd like to find out why these files are ignored, and if there is something I can do to fix them. Of course I'd like to have a complete database of my old emails, with nothing falling through the cracks like this. The vast majority of these ignored mails are not ignored after I transfer them with offlineimap to another computer. I can non-ignore these files probably by copying the renamed file back to the mail server, so this is fixable. Offlineimap shouldn't mess with the file's contents, so is there something that can cause notmuch to ignore a file based on its name? Looking at the rest of the ignored messages most of them seem to have very large attachments, but there are possibly others. There is only maybe 20 of these kinds of emails so I can try to fix them manually. Still, it would help if I knew what exactly caused notmuch to ignore the file. I understand most of the message parsing is done with gmime. Does gmime give any diagnostics on parse errors that could be used to give a reason for thinking a file is not mail? I understand that the list of non-mail files is stored in the notmuch database and the files are completely ignored from there on. This actually makes it harder to debug these kind of issues since the list of ignored mails is only visible on the first invocation of 'notmuch new', unless the files are moved around. Is there some way to extract the list of ignored files from the database for inspection? Maybe 'notmuch new' could have some kind of --unignore-non-mail switch that would reconsider previously ignored files. -- Perttu Luukko
Re: How to debug 'ignoring non-mail file' issues
Perttu Luukko writes: > I understand that the list of non-mail files is stored in the notmuch > database and the files are completely ignored from there on. This > actually makes it harder to debug these kind of issues since the list of > ignored mails is only visible on the first invocation of 'notmuch new', > unless the files are moved around. Is there some way to extract the list > of ignored files from the database for inspection? Maybe 'notmuch new' > could have some kind of --unignore-non-mail switch that would reconsider > previously ignored files. I _think_ it should suffice to do something like find Maildir -type d -exec touch {} \; to force a rescan d ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
How to debug 'ignoring non-mail file' issues
Perttu Luukko writes: > I understand that the list of non-mail files is stored in the notmuch > database and the files are completely ignored from there on. This > actually makes it harder to debug these kind of issues since the list of > ignored mails is only visible on the first invocation of 'notmuch new', > unless the files are moved around. Is there some way to extract the list > of ignored files from the database for inspection? Maybe 'notmuch new' > could have some kind of --unignore-non-mail switch that would reconsider > previously ignored files. I _think_ it should suffice to do something like find Maildir -type d -exec touch {} \; to force a rescan d
Re: How to debug 'ignoring non-mail file' issues
Perttu Luukko writes: > The vast majority of these ignored mails are not ignored after I > transfer them with offlineimap to another computer. I can non-ignore > these files probably by copying the renamed file back to the mail > server, so this is fixable. Offlineimap shouldn't mess with the file's > contents, so is there something that can cause notmuch to ignore a file > based on its name? The most likely cause is that the files are mboxes, whether intentional or not. In particular if they start with a "From " (note the lack of :) and contain a second "From " at the beginning of a line later in the file. In this case something like sed can replace the initial "From " with "X-Envelope-From: ". I agree that the error message could be more informative in this case. d ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
How to debug 'ignoring non-mail file' issues
Perttu Luukko writes: > The vast majority of these ignored mails are not ignored after I > transfer them with offlineimap to another computer. I can non-ignore > these files probably by copying the renamed file back to the mail > server, so this is fixable. Offlineimap shouldn't mess with the file's > contents, so is there something that can cause notmuch to ignore a file > based on its name? The most likely cause is that the files are mboxes, whether intentional or not. In particular if they start with a "From " (note the lack of :) and contain a second "From " at the beginning of a line later in the file. In this case something like sed can replace the initial "From " with "X-Envelope-From: ". I agree that the error message could be more informative in this case. d
How to debug 'ignoring non-mail file' issues
Hi, I indexed my archive of emails from recent years with notmuch (about 10k messages so not much). I have quite a lot of messages 'notmuch new' ignores as non-mail files, about 1000 of them. They are not obviously malformed, meaning that the files certainly look like emails when opened in a text editor. I'd like to find out why these files are ignored, and if there is something I can do to fix them. Of course I'd like to have a complete database of my old emails, with nothing falling through the cracks like this. The vast majority of these ignored mails are not ignored after I transfer them with offlineimap to another computer. I can non-ignore these files probably by copying the renamed file back to the mail server, so this is fixable. Offlineimap shouldn't mess with the file's contents, so is there something that can cause notmuch to ignore a file based on its name? Looking at the rest of the ignored messages most of them seem to have very large attachments, but there are possibly others. There is only maybe 20 of these kinds of emails so I can try to fix them manually. Still, it would help if I knew what exactly caused notmuch to ignore the file. I understand most of the message parsing is done with gmime. Does gmime give any diagnostics on parse errors that could be used to give a reason for thinking a file is not mail? I understand that the list of non-mail files is stored in the notmuch database and the files are completely ignored from there on. This actually makes it harder to debug these kind of issues since the list of ignored mails is only visible on the first invocation of 'notmuch new', unless the files are moved around. Is there some way to extract the list of ignored files from the database for inspection? Maybe 'notmuch new' could have some kind of --unignore-non-mail switch that would reconsider previously ignored files. -- Perttu Luukko ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch