How to debug 'ignoring non-mail file' issues

2014-09-04 Thread Perttu Luukko
On 2014-09-01 09:41:06, Perttu Luukko wrote:
> Yes, that indeed works. I'll probably move these ignored files to a
> separate folder for inspection.

I looked at the mails that are still ignored after upgrading GMime to
latest version, and I think I have found what they have in common. All
of my ignored emails are from 2010-2011, and for some reason these mails
contain a line like this:

>From username  Wed Sep 28 16:43:49 2011

somewhere among the headers. Note the '>' at the beginning of the line.
The mails that are still ignored after upgrading GMime are those where
this line happens to be the first line. Also, all of them have
attachments for some reason. That line certainly doesn't look right, and
I don't know where it came from. It might be some byproduct of mail
redirection, since it shows my username, but the mails are not sent by
me.

I moved these problematic lines to the second line of each message, and
now they are imported without problems. I probably won't file a bug for
GMime because I have no idea whether this is just some oddity caused by
my mail setup. Let this information reside here in case someone else has
a similar problem.

-- 
Perttu


How to debug 'ignoring non-mail file' issues

2014-09-04 Thread Perttu Luukko
On 2014-09-03 19:03:40, Jani Nikula wrote:
> On Wed, 03 Sep 2014, Perttu Luukko  wrote:
> > What I mean that there would be a separate error for cases "Does not
> > resemble an email message at all", i.e., some control file your mail
> > server happens to store in the mailbox, and "Looks like mail but we
> > can't parse it", i.e., better find out why it can't be parsed to avoid
> > potentially important messages going missing from the database.
> 
> As I said, GMime does not tell us the difference between the two.

There could be a separate parsing step that reads the first kilobyte or
so and checks whether it is text, and whether there is a line starting
with "From: " and possibly other headers. This could be run if GMime
thinks the file is not mail so there would be negligible overhead.

This is just a suggestion. Notmuch users are probably quite experienced
so they can always investigate on their own why their emails are being
ignored. But there could be more warning about ignored messages.
Something like, at the end of each 'notmuch new' output: "Note: some
files were ignored as non-mail. Check the list at
~/mail/.notmuch/ignored-files and adjust your ~/.notmuch-config".

-- 
Perttu


How to debug 'ignoring non-mail file' issues

2014-09-03 Thread Perttu Luukko
On 2014-09-02 23:37:12, Jani Nikula wrote:
> On Mon, 01 Sep 2014, Perttu Luukko  wrote:
> > Yes, upgrading to GMime 2.6.20 caused all the messages on my server
> > classified as mail.
> 
> What was the old version? If it was 2.4 we should probably consider
> dropping support for that in future notmuch.

It was 2.4.33. It might still work for other people, I don't know. I
still have some ignored mails. If I can nail down why they are ignored
we might now more about why GMime 2.4 ignored even more mail. They were
from around the same time period, so it might have something to do with
the email setup I had at that time.

> > Even more reason to give a separate warning for GMime parse errors.
> 
> Not sure. We only get a binary success/fail from GMime, and that gets
> printed for all non-email files. I'm not sure it's helpful.

What I mean that there would be a separate error for cases "Does not
resemble an email message at all", i.e., some control file your mail
server happens to store in the mailbox, and "Looks like mail but we
can't parse it", i.e., better find out why it can't be parsed to avoid
potentially important messages going missing from the database.

-- 
Perttu


Re: How to debug 'ignoring non-mail file' issues

2014-09-03 Thread Perttu Luukko
On 2014-09-03 19:03:40, Jani Nikula wrote:
 On Wed, 03 Sep 2014, Perttu Luukko perttu.luu...@iki.fi wrote:
  What I mean that there would be a separate error for cases Does not
  resemble an email message at all, i.e., some control file your mail
  server happens to store in the mailbox, and Looks like mail but we
  can't parse it, i.e., better find out why it can't be parsed to avoid
  potentially important messages going missing from the database.
 
 As I said, GMime does not tell us the difference between the two.

There could be a separate parsing step that reads the first kilobyte or
so and checks whether it is text, and whether there is a line starting
with From:  and possibly other headers. This could be run if GMime
thinks the file is not mail so there would be negligible overhead.

This is just a suggestion. Notmuch users are probably quite experienced
so they can always investigate on their own why their emails are being
ignored. But there could be more warning about ignored messages.
Something like, at the end of each 'notmuch new' output: Note: some
files were ignored as non-mail. Check the list at
~/mail/.notmuch/ignored-files and adjust your ~/.notmuch-config.

-- 
Perttu
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


How to debug 'ignoring non-mail file' issues

2014-09-01 Thread Perttu Luukko
On 2014-09-01 09:52:20, Perttu Luukko wrote:
> If the files really are ignored because of GMime it also explains why so
> much more files are ignored on my mail provider's server than on my
> laptop. The server probably has an older version of GMime. I'll upgrade
> and see if that makes a difference.

Yes, upgrading to GMime 2.6.20 caused all the messages on my server
classified as mail. Even more reason to give a separate warning for
GMime parse errors. I'll see if my archive of older emails still
contains some ignored files.

-- 
Perttu


How to debug 'ignoring non-mail file' issues

2014-09-01 Thread Perttu Luukko
On 2014-08-31 07:41:42, David Bremner wrote:
> Perttu Luukko  writes:
> > The vast majority of these ignored mails are not ignored after I
> > transfer them with offlineimap to another computer. I can non-ignore
> > these files probably by copying the renamed file back to the mail
> > server, so this is fixable. Offlineimap shouldn't mess with the file's
> > contents, so is there something that can cause notmuch to ignore a file
> > based on its name?
> 
> The most likely cause is that the files are mboxes, whether intentional
> or not.  In particular if they start with a "From " (note the lack of :)
> and contain a second "From " at the beginning of a line later in the
> file. In this case something like sed can replace the initial 
> "From " with "X-Envelope-From: ".
> 
> I agree that the error message could be more informative in this case.

No, the mails do contain "From: " with the appropriate colon. If I
understood correctly notmuch returns the same "not mail" return code
both when the essential headers are missing (so the file probably really
isn't mail) and when GMime fails to parse the message. I think it would
be a good idea to give a different warning in the latter case.

If the files really are ignored because of GMime it also explains why so
much more files are ignored on my mail provider's server than on my
laptop. The server probably has an older version of GMime. I'll upgrade
and see if that makes a difference.

-- 
Perttu


How to debug 'ignoring non-mail file' issues

2014-09-01 Thread Perttu Luukko
On 2014-08-31 09:46:12, David Bremner wrote:
> Perttu Luukko  writes:
> 
> > I understand that the list of non-mail files is stored in the
> > notmuch database and the files are completely ignored from there on.
> > This actually makes it harder to debug these kind of issues since
> > the list of ignored mails is only visible on the first invocation of
> > 'notmuch new', unless the files are moved around. Is there some way
> > to extract the list of ignored files from the database for
> > inspection? Maybe 'notmuch new' could have some kind of
> > --unignore-non-mail switch that would reconsider previously ignored
> > files.
> 
> I _think_ it should suffice to do something like
> 
>find Maildir -type d -exec touch {} \;
> 
> to force a rescan

Yes, that indeed works. I'll probably move these ignored files to a
separate folder for inspection.

-- 
Perttu


Re: How to debug 'ignoring non-mail file' issues

2014-09-01 Thread Perttu Luukko
On 2014-08-31 07:41:42, David Bremner wrote:
 Perttu Luukko perttu.luu...@iki.fi writes:
  The vast majority of these ignored mails are not ignored after I
  transfer them with offlineimap to another computer. I can non-ignore
  these files probably by copying the renamed file back to the mail
  server, so this is fixable. Offlineimap shouldn't mess with the file's
  contents, so is there something that can cause notmuch to ignore a file
  based on its name?
 
 The most likely cause is that the files are mboxes, whether intentional
 or not.  In particular if they start with a From  (note the lack of :)
 and contain a second From  at the beginning of a line later in the
 file. In this case something like sed can replace the initial 
 From  with X-Envelope-From: .
 
 I agree that the error message could be more informative in this case.

No, the mails do contain From:  with the appropriate colon. If I
understood correctly notmuch returns the same not mail return code
both when the essential headers are missing (so the file probably really
isn't mail) and when GMime fails to parse the message. I think it would
be a good idea to give a different warning in the latter case.

If the files really are ignored because of GMime it also explains why so
much more files are ignored on my mail provider's server than on my
laptop. The server probably has an older version of GMime. I'll upgrade
and see if that makes a difference.

-- 
Perttu
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: How to debug 'ignoring non-mail file' issues

2014-09-01 Thread Perttu Luukko
On 2014-09-01 09:52:20, Perttu Luukko wrote:
 If the files really are ignored because of GMime it also explains why so
 much more files are ignored on my mail provider's server than on my
 laptop. The server probably has an older version of GMime. I'll upgrade
 and see if that makes a difference.

Yes, upgrading to GMime 2.6.20 caused all the messages on my server
classified as mail. Even more reason to give a separate warning for
GMime parse errors. I'll see if my archive of older emails still
contains some ignored files.

-- 
Perttu
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


How to debug 'ignoring non-mail file' issues

2014-08-31 Thread Perttu Luukko
Hi,

I indexed my archive of emails from recent years with notmuch (about 10k
messages so not much). I have quite a lot of messages 'notmuch new'
ignores as non-mail files, about 1000 of them. They are not obviously
malformed, meaning that the files certainly look like emails when opened
in a text editor. I'd like to find out why these files are ignored, and
if there is something I can do to fix them. Of course I'd like to have a
complete database of my old emails, with nothing falling through the
cracks like this.

The vast majority of these ignored mails are not ignored after I
transfer them with offlineimap to another computer. I can non-ignore
these files probably by copying the renamed file back to the mail
server, so this is fixable. Offlineimap shouldn't mess with the file's
contents, so is there something that can cause notmuch to ignore a file
based on its name?

Looking at the rest of the ignored messages most of them seem to have
very large attachments, but there are possibly others. There is only
maybe 20 of these kinds of emails so I can try to fix them manually.
Still, it would help if I knew what exactly caused notmuch to ignore the
file. I understand most of the message parsing is done with gmime. Does
gmime give any diagnostics on parse errors that could be used to give a
reason for thinking a file is not mail?

I understand that the list of non-mail files is stored in the notmuch
database and the files are completely ignored from there on. This
actually makes it harder to debug these kind of issues since the list of
ignored mails is only visible on the first invocation of 'notmuch new',
unless the files are moved around. Is there some way to extract the list
of ignored files from the database for inspection? Maybe 'notmuch new'
could have some kind of --unignore-non-mail switch that would reconsider
previously ignored files.

-- 
Perttu Luukko


How to debug 'ignoring non-mail file' issues

2014-08-31 Thread Perttu Luukko
Hi,

I indexed my archive of emails from recent years with notmuch (about 10k
messages so not much). I have quite a lot of messages 'notmuch new'
ignores as non-mail files, about 1000 of them. They are not obviously
malformed, meaning that the files certainly look like emails when opened
in a text editor. I'd like to find out why these files are ignored, and
if there is something I can do to fix them. Of course I'd like to have a
complete database of my old emails, with nothing falling through the
cracks like this.

The vast majority of these ignored mails are not ignored after I
transfer them with offlineimap to another computer. I can non-ignore
these files probably by copying the renamed file back to the mail
server, so this is fixable. Offlineimap shouldn't mess with the file's
contents, so is there something that can cause notmuch to ignore a file
based on its name?

Looking at the rest of the ignored messages most of them seem to have
very large attachments, but there are possibly others. There is only
maybe 20 of these kinds of emails so I can try to fix them manually.
Still, it would help if I knew what exactly caused notmuch to ignore the
file. I understand most of the message parsing is done with gmime. Does
gmime give any diagnostics on parse errors that could be used to give a
reason for thinking a file is not mail?

I understand that the list of non-mail files is stored in the notmuch
database and the files are completely ignored from there on. This
actually makes it harder to debug these kind of issues since the list of
ignored mails is only visible on the first invocation of 'notmuch new',
unless the files are moved around. Is there some way to extract the list
of ignored files from the database for inspection? Maybe 'notmuch new'
could have some kind of --unignore-non-mail switch that would reconsider
previously ignored files.

-- 
Perttu Luukko
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


'notmuch new' trying to read non-existing files

2014-08-27 Thread Perttu Luukko
On 2014-08-26 13:01:22, David Bremner wrote:
> Perttu Luukko  writes:
> > When I run 'notmuch new' I get:
> >
> > Found 9903 total files (that's not much mail).
> > Error reading file /home/users/(username)/Maildir/.act/.act: No such 
> > file
> > or directory
> 
> I'm grasping at straws a bit, but do you by chance have some fancy
> symlinks in your Maildir?  

Well this is embarrassing, but this was indeed the case. I had cleaned
the Maildir from what I thought were leftover symlinks from some time
long ago. Actually, I had a script that creates, with symlinks, a copy
of my mailbox without the dots in the directory names for use with Mutt.
I had left out parameter -T from 'ln' so my script also created weird
symlinks at ~/Maildir, and thus resurrected the links I thought I
cleaned up. And the links were indeed the problem.

Everything is working now. Sorry and thanks!

-- 
Perttu


Re: 'notmuch new' trying to read non-existing files

2014-08-27 Thread Perttu Luukko
On 2014-08-26 13:01:22, David Bremner wrote:
 Perttu Luukko perttu.luu...@iki.fi writes:
  When I run 'notmuch new' I get:
 
  Found 9903 total files (that's not much mail).
  Error reading file /home/users/(username)/Maildir/.act/.act: No such 
  file
  or directory
 
 I'm grasping at straws a bit, but do you by chance have some fancy
 symlinks in your Maildir?  

Well this is embarrassing, but this was indeed the case. I had cleaned
the Maildir from what I thought were leftover symlinks from some time
long ago. Actually, I had a script that creates, with symlinks, a copy
of my mailbox without the dots in the directory names for use with Mutt.
I had left out parameter -T from 'ln' so my script also created weird
symlinks at ~/Maildir, and thus resurrected the links I thought I
cleaned up. And the links were indeed the problem.

Everything is working now. Sorry and thanks!

-- 
Perttu
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


'notmuch new' trying to read non-existing files

2014-08-26 Thread Perttu Luukko
Hi,

I decided to give notmuch a spin and installed version 0.18.1 on my mail
provider's shell server. The layout offered by my mail provider's
Dovecot is such that INBOX is stored in Maildir format at ~/Maildir and
other folders are stored as subfolders of ~/Maildir, filename of each
directory beginning with a period. In addition, ~/Maildir contains files
'dovecot-uidlist' and dovecot-uidvalidity', and each subdirectory
contains an empty file 'maildirfolder' in addition to the usual cur, new
and tmp. I don't know if this is an unusual layout or not.

When I run 'notmuch new' I get:

Found 9903 total files (that's not much mail).
Error reading file /home/users/(username)/Maildir/.act/.act: No such 
file
or directory
Processed 1 file in almost no time.
Added 1 new message to the database.
Note: A fatal error was encountered: Something went wrong trying to read
or write a file

The subdirectory .act is really the first (alphabetically) subdirectory
of ~/Maildir, but .act/.act does not exist and I don't know why notmuch
tries to read it. In a following run the .act subdirectory gets replaced
by .Drafts, but the error is the same. So for some reason 'notmuch new'
tries to read for each subdirectory a deeper subdirectory which does not
exist. Only emails in the top-level INBOX folder are added to the
database. The same collection of email offlineimap'd to my local
computer and with a more plain layout (each folder as a subdirectory of
~/mail, no dots) is read without problems.

What could be going wrong here? Is this a layout that should be indexed
by notmuch?

Please note that I'm not subscribed to this mailing list -- I'm not
using notmuch yet so I can't handle the volume :)

-- 
Perttu Luukko
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


'notmuch new' trying to read non-existing files

2014-08-25 Thread Perttu Luukko
Hi,

I decided to give notmuch a spin and installed version 0.18.1 on my mail
provider's shell server. The layout offered by my mail provider's
Dovecot is such that INBOX is stored in Maildir format at ~/Maildir and
other folders are stored as subfolders of ~/Maildir, filename of each
directory beginning with a period. In addition, ~/Maildir contains files
'dovecot-uidlist' and dovecot-uidvalidity', and each subdirectory
contains an empty file 'maildirfolder' in addition to the usual cur, new
and tmp. I don't know if this is an unusual layout or not.

When I run 'notmuch new' I get:

Found 9903 total files (that's not much mail).
Error reading file /home/users/(username)/Maildir/.act/.act: No such 
file
or directory
Processed 1 file in almost no time.
Added 1 new message to the database.
Note: A fatal error was encountered: Something went wrong trying to read
or write a file

The subdirectory .act is really the first (alphabetically) subdirectory
of ~/Maildir, but .act/.act does not exist and I don't know why notmuch
tries to read it. In a following run the .act subdirectory gets replaced
by .Drafts, but the error is the same. So for some reason 'notmuch new'
tries to read for each subdirectory a deeper subdirectory which does not
exist. Only emails in the top-level INBOX folder are added to the
database. The same collection of email offlineimap'd to my local
computer and with a more plain layout (each folder as a subdirectory of
~/mail, no dots) is read without problems.

What could be going wrong here? Is this a layout that should be indexed
by notmuch?

Please note that I'm not subscribed to this mailing list -- I'm not
using notmuch yet so I can't handle the volume :)

-- 
Perttu Luukko