How to debug 'ignoring non-mail file' issues

2014-09-04 Thread Perttu Luukko
On 2014-09-01 09:41:06, Perttu Luukko wrote:
> Yes, that indeed works. I'll probably move these ignored files to a
> separate folder for inspection.

I looked at the mails that are still ignored after upgrading GMime to
latest version, and I think I have found what they have in common. All
of my ignored emails are from 2010-2011, and for some reason these mails
contain a line like this:

>From username  Wed Sep 28 16:43:49 2011

somewhere among the headers. Note the '>' at the beginning of the line.
The mails that are still ignored after upgrading GMime are those where
this line happens to be the first line. Also, all of them have
attachments for some reason. That line certainly doesn't look right, and
I don't know where it came from. It might be some byproduct of mail
redirection, since it shows my username, but the mails are not sent by
me.

I moved these problematic lines to the second line of each message, and
now they are imported without problems. I probably won't file a bug for
GMime because I have no idea whether this is just some oddity caused by
my mail setup. Let this information reside here in case someone else has
a similar problem.

-- 
Perttu


How to debug 'ignoring non-mail file' issues

2014-09-04 Thread Perttu Luukko
On 2014-09-03 19:03:40, Jani Nikula wrote:
> On Wed, 03 Sep 2014, Perttu Luukko  wrote:
> > What I mean that there would be a separate error for cases "Does not
> > resemble an email message at all", i.e., some control file your mail
> > server happens to store in the mailbox, and "Looks like mail but we
> > can't parse it", i.e., better find out why it can't be parsed to avoid
> > potentially important messages going missing from the database.
> 
> As I said, GMime does not tell us the difference between the two.

There could be a separate parsing step that reads the first kilobyte or
so and checks whether it is text, and whether there is a line starting
with "From: " and possibly other headers. This could be run if GMime
thinks the file is not mail so there would be negligible overhead.

This is just a suggestion. Notmuch users are probably quite experienced
so they can always investigate on their own why their emails are being
ignored. But there could be more warning about ignored messages.
Something like, at the end of each 'notmuch new' output: "Note: some
files were ignored as non-mail. Check the list at
~/mail/.notmuch/ignored-files and adjust your ~/.notmuch-config".

-- 
Perttu


How to debug 'ignoring non-mail file' issues

2014-09-03 Thread Jani Nikula
On Wed, 03 Sep 2014, Perttu Luukko  wrote:
> What I mean that there would be a separate error for cases "Does not
> resemble an email message at all", i.e., some control file your mail
> server happens to store in the mailbox, and "Looks like mail but we
> can't parse it", i.e., better find out why it can't be parsed to avoid
> potentially important messages going missing from the database.

As I said, GMime does not tell us the difference between the two.

BR,
Jani.


How to debug 'ignoring non-mail file' issues

2014-09-03 Thread Perttu Luukko
On 2014-09-02 23:37:12, Jani Nikula wrote:
> On Mon, 01 Sep 2014, Perttu Luukko  wrote:
> > Yes, upgrading to GMime 2.6.20 caused all the messages on my server
> > classified as mail.
> 
> What was the old version? If it was 2.4 we should probably consider
> dropping support for that in future notmuch.

It was 2.4.33. It might still work for other people, I don't know. I
still have some ignored mails. If I can nail down why they are ignored
we might now more about why GMime 2.4 ignored even more mail. They were
from around the same time period, so it might have something to do with
the email setup I had at that time.

> > Even more reason to give a separate warning for GMime parse errors.
> 
> Not sure. We only get a binary success/fail from GMime, and that gets
> printed for all non-email files. I'm not sure it's helpful.

What I mean that there would be a separate error for cases "Does not
resemble an email message at all", i.e., some control file your mail
server happens to store in the mailbox, and "Looks like mail but we
can't parse it", i.e., better find out why it can't be parsed to avoid
potentially important messages going missing from the database.

-- 
Perttu


How to debug 'ignoring non-mail file' issues

2014-09-03 Thread Jani Nikula
On Mon, 01 Sep 2014, Perttu Luukko  wrote:
> Yes, upgrading to GMime 2.6.20 caused all the messages on my server
> classified as mail.

What was the old version? If it was 2.4 we should probably consider
dropping support for that in future notmuch.

> Even more reason to give a separate warning for GMime parse errors.

Not sure. We only get a binary success/fail from GMime, and that gets
printed for all non-email files. I'm not sure it's helpful.

BR,
Jani.


Re: How to debug 'ignoring non-mail file' issues

2014-09-03 Thread Jani Nikula
On Wed, 03 Sep 2014, Perttu Luukko perttu.luu...@iki.fi wrote:
 What I mean that there would be a separate error for cases Does not
 resemble an email message at all, i.e., some control file your mail
 server happens to store in the mailbox, and Looks like mail but we
 can't parse it, i.e., better find out why it can't be parsed to avoid
 potentially important messages going missing from the database.

As I said, GMime does not tell us the difference between the two.

BR,
Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: How to debug 'ignoring non-mail file' issues

2014-09-03 Thread Perttu Luukko
On 2014-09-03 19:03:40, Jani Nikula wrote:
 On Wed, 03 Sep 2014, Perttu Luukko perttu.luu...@iki.fi wrote:
  What I mean that there would be a separate error for cases Does not
  resemble an email message at all, i.e., some control file your mail
  server happens to store in the mailbox, and Looks like mail but we
  can't parse it, i.e., better find out why it can't be parsed to avoid
  potentially important messages going missing from the database.
 
 As I said, GMime does not tell us the difference between the two.

There could be a separate parsing step that reads the first kilobyte or
so and checks whether it is text, and whether there is a line starting
with From:  and possibly other headers. This could be run if GMime
thinks the file is not mail so there would be negligible overhead.

This is just a suggestion. Notmuch users are probably quite experienced
so they can always investigate on their own why their emails are being
ignored. But there could be more warning about ignored messages.
Something like, at the end of each 'notmuch new' output: Note: some
files were ignored as non-mail. Check the list at
~/mail/.notmuch/ignored-files and adjust your ~/.notmuch-config.

-- 
Perttu
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: How to debug 'ignoring non-mail file' issues

2014-09-02 Thread Jani Nikula
On Mon, 01 Sep 2014, Perttu Luukko perttu.luu...@iki.fi wrote:
 Yes, upgrading to GMime 2.6.20 caused all the messages on my server
 classified as mail.

What was the old version? If it was 2.4 we should probably consider
dropping support for that in future notmuch.

 Even more reason to give a separate warning for GMime parse errors.

Not sure. We only get a binary success/fail from GMime, and that gets
printed for all non-email files. I'm not sure it's helpful.

BR,
Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


How to debug 'ignoring non-mail file' issues

2014-09-01 Thread Tomi Ollila
On Mon, Sep 01 2014, Perttu Luukko  wrote:

> On 2014-08-31 07:41:42, David Bremner wrote:
>> Perttu Luukko  writes:
>> > The vast majority of these ignored mails are not ignored after I
>> > transfer them with offlineimap to another computer. I can non-ignore
>> > these files probably by copying the renamed file back to the mail
>> > server, so this is fixable. Offlineimap shouldn't mess with the file's
>> > contents, so is there something that can cause notmuch to ignore a file
>> > based on its name?
>> 
>> The most likely cause is that the files are mboxes, whether intentional
>> or not.  In particular if they start with a "From " (note the lack of :)
>> and contain a second "From " at the beginning of a line later in the
>> file. In this case something like sed can replace the initial 
>> "From " with "X-Envelope-From: ".
>> 
>> I agree that the error message could be more informative in this case.
>
> No, the mails do contain "From: " with the appropriate colon. If I
> understood correctly notmuch returns the same "not mail" return code

The question here is whether the very first line of the mail file begins
with 'From ', not whether *any* of the actual header line starts with 'From: '
IIRC the mails get accepted even the 'From:' header were missing...

> both when the essential headers are missing (so the file probably really
> isn't mail) and when GMime fails to parse the message. I think it would
> be a good idea to give a different warning in the latter case.

Sure... :D

>
> If the files really are ignored because of GMime it also explains why so
> much more files are ignored on my mail provider's server than on my
> laptop. The server probably has an older version of GMime. I'll upgrade
> and see if that makes a difference.
>
> -- 
> Perttu


Tomi


How to debug 'ignoring non-mail file' issues

2014-09-01 Thread Perttu Luukko
On 2014-09-01 09:52:20, Perttu Luukko wrote:
> If the files really are ignored because of GMime it also explains why so
> much more files are ignored on my mail provider's server than on my
> laptop. The server probably has an older version of GMime. I'll upgrade
> and see if that makes a difference.

Yes, upgrading to GMime 2.6.20 caused all the messages on my server
classified as mail. Even more reason to give a separate warning for
GMime parse errors. I'll see if my archive of older emails still
contains some ignored files.

-- 
Perttu


How to debug 'ignoring non-mail file' issues

2014-09-01 Thread Perttu Luukko
On 2014-08-31 07:41:42, David Bremner wrote:
> Perttu Luukko  writes:
> > The vast majority of these ignored mails are not ignored after I
> > transfer them with offlineimap to another computer. I can non-ignore
> > these files probably by copying the renamed file back to the mail
> > server, so this is fixable. Offlineimap shouldn't mess with the file's
> > contents, so is there something that can cause notmuch to ignore a file
> > based on its name?
> 
> The most likely cause is that the files are mboxes, whether intentional
> or not.  In particular if they start with a "From " (note the lack of :)
> and contain a second "From " at the beginning of a line later in the
> file. In this case something like sed can replace the initial 
> "From " with "X-Envelope-From: ".
> 
> I agree that the error message could be more informative in this case.

No, the mails do contain "From: " with the appropriate colon. If I
understood correctly notmuch returns the same "not mail" return code
both when the essential headers are missing (so the file probably really
isn't mail) and when GMime fails to parse the message. I think it would
be a good idea to give a different warning in the latter case.

If the files really are ignored because of GMime it also explains why so
much more files are ignored on my mail provider's server than on my
laptop. The server probably has an older version of GMime. I'll upgrade
and see if that makes a difference.

-- 
Perttu


How to debug 'ignoring non-mail file' issues

2014-09-01 Thread Perttu Luukko
On 2014-08-31 09:46:12, David Bremner wrote:
> Perttu Luukko  writes:
> 
> > I understand that the list of non-mail files is stored in the
> > notmuch database and the files are completely ignored from there on.
> > This actually makes it harder to debug these kind of issues since
> > the list of ignored mails is only visible on the first invocation of
> > 'notmuch new', unless the files are moved around. Is there some way
> > to extract the list of ignored files from the database for
> > inspection? Maybe 'notmuch new' could have some kind of
> > --unignore-non-mail switch that would reconsider previously ignored
> > files.
> 
> I _think_ it should suffice to do something like
> 
>find Maildir -type d -exec touch {} \;
> 
> to force a rescan

Yes, that indeed works. I'll probably move these ignored files to a
separate folder for inspection.

-- 
Perttu


Re: How to debug 'ignoring non-mail file' issues

2014-09-01 Thread Perttu Luukko
On 2014-08-31 07:41:42, David Bremner wrote:
 Perttu Luukko perttu.luu...@iki.fi writes:
  The vast majority of these ignored mails are not ignored after I
  transfer them with offlineimap to another computer. I can non-ignore
  these files probably by copying the renamed file back to the mail
  server, so this is fixable. Offlineimap shouldn't mess with the file's
  contents, so is there something that can cause notmuch to ignore a file
  based on its name?
 
 The most likely cause is that the files are mboxes, whether intentional
 or not.  In particular if they start with a From  (note the lack of :)
 and contain a second From  at the beginning of a line later in the
 file. In this case something like sed can replace the initial 
 From  with X-Envelope-From: .
 
 I agree that the error message could be more informative in this case.

No, the mails do contain From:  with the appropriate colon. If I
understood correctly notmuch returns the same not mail return code
both when the essential headers are missing (so the file probably really
isn't mail) and when GMime fails to parse the message. I think it would
be a good idea to give a different warning in the latter case.

If the files really are ignored because of GMime it also explains why so
much more files are ignored on my mail provider's server than on my
laptop. The server probably has an older version of GMime. I'll upgrade
and see if that makes a difference.

-- 
Perttu
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: How to debug 'ignoring non-mail file' issues

2014-09-01 Thread Perttu Luukko
On 2014-09-01 09:52:20, Perttu Luukko wrote:
 If the files really are ignored because of GMime it also explains why so
 much more files are ignored on my mail provider's server than on my
 laptop. The server probably has an older version of GMime. I'll upgrade
 and see if that makes a difference.

Yes, upgrading to GMime 2.6.20 caused all the messages on my server
classified as mail. Even more reason to give a separate warning for
GMime parse errors. I'll see if my archive of older emails still
contains some ignored files.

-- 
Perttu
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: How to debug 'ignoring non-mail file' issues

2014-09-01 Thread Tomi Ollila
On Mon, Sep 01 2014, Perttu Luukko perttu.luu...@iki.fi wrote:

 On 2014-08-31 07:41:42, David Bremner wrote:
 Perttu Luukko perttu.luu...@iki.fi writes:
  The vast majority of these ignored mails are not ignored after I
  transfer them with offlineimap to another computer. I can non-ignore
  these files probably by copying the renamed file back to the mail
  server, so this is fixable. Offlineimap shouldn't mess with the file's
  contents, so is there something that can cause notmuch to ignore a file
  based on its name?
 
 The most likely cause is that the files are mboxes, whether intentional
 or not.  In particular if they start with a From  (note the lack of :)
 and contain a second From  at the beginning of a line later in the
 file. In this case something like sed can replace the initial 
 From  with X-Envelope-From: .
 
 I agree that the error message could be more informative in this case.

 No, the mails do contain From:  with the appropriate colon. If I
 understood correctly notmuch returns the same not mail return code

The question here is whether the very first line of the mail file begins
with 'From ', not whether *any* of the actual header line starts with 'From: '
IIRC the mails get accepted even the 'From:' header were missing...

 both when the essential headers are missing (so the file probably really
 isn't mail) and when GMime fails to parse the message. I think it would
 be a good idea to give a different warning in the latter case.

Sure... :D


 If the files really are ignored because of GMime it also explains why so
 much more files are ignored on my mail provider's server than on my
 laptop. The server probably has an older version of GMime. I'll upgrade
 and see if that makes a difference.

 -- 
 Perttu


Tomi
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


How to debug 'ignoring non-mail file' issues

2014-08-31 Thread Perttu Luukko
Hi,

I indexed my archive of emails from recent years with notmuch (about 10k
messages so not much). I have quite a lot of messages 'notmuch new'
ignores as non-mail files, about 1000 of them. They are not obviously
malformed, meaning that the files certainly look like emails when opened
in a text editor. I'd like to find out why these files are ignored, and
if there is something I can do to fix them. Of course I'd like to have a
complete database of my old emails, with nothing falling through the
cracks like this.

The vast majority of these ignored mails are not ignored after I
transfer them with offlineimap to another computer. I can non-ignore
these files probably by copying the renamed file back to the mail
server, so this is fixable. Offlineimap shouldn't mess with the file's
contents, so is there something that can cause notmuch to ignore a file
based on its name?

Looking at the rest of the ignored messages most of them seem to have
very large attachments, but there are possibly others. There is only
maybe 20 of these kinds of emails so I can try to fix them manually.
Still, it would help if I knew what exactly caused notmuch to ignore the
file. I understand most of the message parsing is done with gmime. Does
gmime give any diagnostics on parse errors that could be used to give a
reason for thinking a file is not mail?

I understand that the list of non-mail files is stored in the notmuch
database and the files are completely ignored from there on. This
actually makes it harder to debug these kind of issues since the list of
ignored mails is only visible on the first invocation of 'notmuch new',
unless the files are moved around. Is there some way to extract the list
of ignored files from the database for inspection? Maybe 'notmuch new'
could have some kind of --unignore-non-mail switch that would reconsider
previously ignored files.

-- 
Perttu Luukko


How to debug 'ignoring non-mail file' issues

2014-08-31 Thread David Bremner
Perttu Luukko  writes:


> I understand that the list of non-mail files is stored in the notmuch
> database and the files are completely ignored from there on. This
> actually makes it harder to debug these kind of issues since the list of
> ignored mails is only visible on the first invocation of 'notmuch new',
> unless the files are moved around. Is there some way to extract the list
> of ignored files from the database for inspection? Maybe 'notmuch new'
> could have some kind of --unignore-non-mail switch that would reconsider
> previously ignored files.

I _think_ it should suffice to do something like

   find Maildir -type d -exec touch {} \;

to force a rescan

d



How to debug 'ignoring non-mail file' issues

2014-08-31 Thread David Bremner
Perttu Luukko  writes:


> The vast majority of these ignored mails are not ignored after I
> transfer them with offlineimap to another computer. I can non-ignore
> these files probably by copying the renamed file back to the mail
> server, so this is fixable. Offlineimap shouldn't mess with the file's
> contents, so is there something that can cause notmuch to ignore a file
> based on its name?

The most likely cause is that the files are mboxes, whether intentional
or not.  In particular if they start with a "From " (note the lack of :)
and contain a second "From " at the beginning of a line later in the
file. In this case something like sed can replace the initial 
"From " with "X-Envelope-From: ".

I agree that the error message could be more informative in this case.

d


How to debug 'ignoring non-mail file' issues

2014-08-31 Thread Perttu Luukko
Hi,

I indexed my archive of emails from recent years with notmuch (about 10k
messages so not much). I have quite a lot of messages 'notmuch new'
ignores as non-mail files, about 1000 of them. They are not obviously
malformed, meaning that the files certainly look like emails when opened
in a text editor. I'd like to find out why these files are ignored, and
if there is something I can do to fix them. Of course I'd like to have a
complete database of my old emails, with nothing falling through the
cracks like this.

The vast majority of these ignored mails are not ignored after I
transfer them with offlineimap to another computer. I can non-ignore
these files probably by copying the renamed file back to the mail
server, so this is fixable. Offlineimap shouldn't mess with the file's
contents, so is there something that can cause notmuch to ignore a file
based on its name?

Looking at the rest of the ignored messages most of them seem to have
very large attachments, but there are possibly others. There is only
maybe 20 of these kinds of emails so I can try to fix them manually.
Still, it would help if I knew what exactly caused notmuch to ignore the
file. I understand most of the message parsing is done with gmime. Does
gmime give any diagnostics on parse errors that could be used to give a
reason for thinking a file is not mail?

I understand that the list of non-mail files is stored in the notmuch
database and the files are completely ignored from there on. This
actually makes it harder to debug these kind of issues since the list of
ignored mails is only visible on the first invocation of 'notmuch new',
unless the files are moved around. Is there some way to extract the list
of ignored files from the database for inspection? Maybe 'notmuch new'
could have some kind of --unignore-non-mail switch that would reconsider
previously ignored files.

-- 
Perttu Luukko
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: How to debug 'ignoring non-mail file' issues

2014-08-31 Thread David Bremner
Perttu Luukko perttu.luu...@iki.fi writes:


 The vast majority of these ignored mails are not ignored after I
 transfer them with offlineimap to another computer. I can non-ignore
 these files probably by copying the renamed file back to the mail
 server, so this is fixable. Offlineimap shouldn't mess with the file's
 contents, so is there something that can cause notmuch to ignore a file
 based on its name?

The most likely cause is that the files are mboxes, whether intentional
or not.  In particular if they start with a From  (note the lack of :)
and contain a second From  at the beginning of a line later in the
file. In this case something like sed can replace the initial 
From  with X-Envelope-From: .

I agree that the error message could be more informative in this case.

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: How to debug 'ignoring non-mail file' issues

2014-08-31 Thread David Bremner
Perttu Luukko perttu.luu...@iki.fi writes:


 I understand that the list of non-mail files is stored in the notmuch
 database and the files are completely ignored from there on. This
 actually makes it harder to debug these kind of issues since the list of
 ignored mails is only visible on the first invocation of 'notmuch new',
 unless the files are moved around. Is there some way to extract the list
 of ignored files from the database for inspection? Maybe 'notmuch new'
 could have some kind of --unignore-non-mail switch that would reconsider
 previously ignored files.

I _think_ it should suffice to do something like

   find Maildir -type d -exec touch {} \;

to force a rescan

d

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch