[PATCH] lib: Save filenames for files detected as "not an email file" in the database.

2012-01-22 Thread Jani Nikula
On Sat, 21 Jan 2012 18:49:19 -0500, Austin Clements  wrote:
> Quoth Jani Nikula on Jan 22 at  1:00 am:
> > On Fri, 20 Jan 2012 17:00:27 -0500, Austin Clements  
> > wrote:
> > > Later runs of "notmuch new" won't scan these files again and won't
> > > print warnings.
> > > 
> > > Various programs (Dovecot, in my case) store indexes and caches and
> > > such in the maildir.  Without this, notmuch persistently complains
> > > about such files.
> > 
> > Overall, sounds good and doing this automagically is nice. Superficially
> > the code looks sensible, but I didn't really dig into it. A few nasty
> > questions instead:
> > 
> > What happens if you delete a non-email file? Does the entry stay in the
> > database?
> 
> Phooey.  I thought this worked, but you're right that it doesn't (I
> even wrote a test for this, but the test was based on a false
> assumption).  Non-email files do get returned by the directory
> iterator, so without any changes, notmuch new will notice that they're
> gone.  What I missed is that it then uses
> notmuch_database_find_message_by_filename to find the "message" and
> remove the filename, which won't work since there's no message to
> find.
> 
> I'll have to think about this more.

Sorry about that...

This feature has considerable overlap with file/subdirectory exclusion,
most recently referred to in id:"20120122113212.GA7084 at X200". I like the
way your approach is automatic, but doing it manually with configurable
exclusions has certain explicitness to it, and altogether avoids the
problems here, don't you think? There apparently also are people who
wouldn't want notmuch to index some valid email files for one reason or
another.

I haven't thought this through, but what if the exclude/ignore feature
had both the option to specify explicit files/subdirs (patterns like
.gitignore?) that are ignored, and some sort of "auto" option you could
enable to ignore all non-email files without warnings? This would
obviously all happen in the cli.

That probably does not make your thinking any easier, I'm afraid... but
perhaps it provides another angle.


BR,
Jani.


> 
> > What happens if you replace a non-email file with an email file?
> 
> It will not notice because notmuch new only inspects directory mtimes.
> This would require checking the mtimes of every non-email in the
> database on every notmuch new.
> 
> > Does it matter what happens above?
> > 
> > These are corner cases, but what remains in TODO suggests that it would
> > be difficult to debug and figure out if the above ever did happen to
> > someone.
> 
> Yes.  It's possible this needs to get a search syntax before it is
> acceptable for general use.
> 
> > BR,
> > Jani.


[PATCH] lib: Save filenames for files detected as "not an email file" in the database.

2012-01-22 Thread Jani Nikula
On Fri, 20 Jan 2012 17:00:27 -0500, Austin Clements  wrote:
> Later runs of "notmuch new" won't scan these files again and won't
> print warnings.
> 
> Various programs (Dovecot, in my case) store indexes and caches and
> such in the maildir.  Without this, notmuch persistently complains
> about such files.

Overall, sounds good and doing this automagically is nice. Superficially
the code looks sensible, but I didn't really dig into it. A few nasty
questions instead:

What happens if you delete a non-email file? Does the entry stay in the
database?

What happens if you replace a non-email file with an email file?

Does it matter what happens above?

These are corner cases, but what remains in TODO suggests that it would
be difficult to debug and figure out if the above ever did happen to
someone.


BR,
Jani.


> ---
> Every time I run notmuch new I get a slew of these warnings.  It was
> starting to get on my nerves, so I implemented the solution suggested
> by the TODO file.
> 
>  devel/TODO  |9 +++--
>  lib/database.cc |   41 +
>  test/new|   23 +++
>  3 files changed, 67 insertions(+), 6 deletions(-)
> 
> diff --git a/devel/TODO b/devel/TODO
> index 4dda6f4..b64a26e 100644
> --- a/devel/TODO
> +++ b/devel/TODO
> @@ -260,12 +260,9 @@ existing messages at the next database upgrade).
>  Add support for the user to specify custom headers to be indexed (and
>  re-index these for existing messages at the next database upgrade).
>  
> -Save filenames for files detected as "not an email file" in the
> -database. This would allow for two things: 1. Optimizing "notmuch new"
> -to not have to look at these files again (since they are potentially
> -large so the detection could be potentially slow). 2. A "notmuch
> -search" syntax could be added to allow the user to find these files,
> -(and perhaps delete them or move them away as appropriate).
> +Add a "notmuch search" syntax to allow uses to find files recorded as
> +non-emails in the database (and perhaps delete them or move them away
> +as appropriate).
>  
>  Fix filesystem/notmuch-new race condition by not updating database
>  mtime for a directory if it is the same as the current mtime.
> diff --git a/lib/database.cc b/lib/database.cc
> index 8103bd9..fd1ec6e 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -1618,6 +1618,43 @@ _notmuch_database_link_message (notmuch_database_t 
> *notmuch,
>  return NOTMUCH_STATUS_SUCCESS;
>  }
>  
> +static notmuch_status_t
> +_notmuch_database_add_nonemail (notmuch_database_t *notmuch,
> + const char *filename)
> +{
> +notmuch_status_t status = NOTMUCH_STATUS_SUCCESS;
> +void *local = talloc_new (notmuch);
> +char *term, *direntry;
> +Xapian::docid id;
> +
> +if (notmuch->mode == NOTMUCH_DATABASE_MODE_READ_ONLY)
> + INTERNAL_ERROR ("Failure to ensure database is writable");
> +
> +Xapian::WritableDatabase *db =
> + static_cast  (notmuch->xapian_db);
> +
> +/* Create a document to record the non-email */
> +Xapian::Document nonemail;
> +term = talloc_asprintf (local, "%s%s", _find_prefix ("type"), 
> "nonemail");
> +nonemail.add_term (term, 0);
> +
> +status = _notmuch_database_filename_to_direntry (local, notmuch,
> +  filename, );
> +if (status)
> + goto DONE;
> +term = talloc_asprintf (local, "%s%s", _find_prefix ("file-direntry"),
> + direntry);
> +nonemail.add_term (term, 0);
> +
> +/* Add it to the database */
> +id = _notmuch_database_generate_doc_id (notmuch);
> +db->replace_document (id, nonemail);
> +
> +  DONE:
> +talloc_free (local);
> +return status;
> +}
> +
>  notmuch_status_t
>  notmuch_database_add_message (notmuch_database_t *notmuch,
> const char *filename,
> @@ -1673,6 +1710,10 @@ notmuch_database_add_message (notmuch_database_t 
> *notmuch,
>   (subject == NULL || *subject == '\0') &&
>   (to == NULL || *to == '\0'))
>   {
> + /* The file is not an email.  Record it so we don't
> +  * reconsider this file in the future, which prevents
> +  * potentially expensive scans and annoying warnings. */
> + _notmuch_database_add_nonemail (notmuch, filename);
>   ret = NOTMUCH_STATUS_FILE_NOT_EMAIL;
>   goto DONE;
>   }
> diff --git a/test/new b/test/new
> index 49f390d..346d453 100755
> --- a/test/new
> +++ b/test/new
> @@ -153,4 +153,27 @@ rm -rf "${MAIL_DIR}"/two
>  output=$(NOTMUCH_NEW)
>  test_expect_equal "$output" "No new mail. Removed 3 messages."
>  
> +
> +test_begin_subtest "Skips non-email"
> +PRE_COUNT=$(notmuch search '*' | wc -l)
> +echo "I am not an email" > "${MAIL_DIR}"/nonemail
> +output=$(NOTMUCH_NEW 2>&1 | sed -n '/^Note:/p;$p' | sed 's/\(file:\) .*/\1 
> XXX/')
> +test_expect_equal "$output" "Note: Ignoring 

Re: [PATCH] lib: Save filenames for files detected as not an email file in the database.

2012-01-22 Thread Jani Nikula
On Sat, 21 Jan 2012 18:49:19 -0500, Austin Clements amdra...@mit.edu wrote:
 Quoth Jani Nikula on Jan 22 at  1:00 am:
  On Fri, 20 Jan 2012 17:00:27 -0500, Austin Clements amdra...@mit.edu 
  wrote:
   Later runs of notmuch new won't scan these files again and won't
   print warnings.
   
   Various programs (Dovecot, in my case) store indexes and caches and
   such in the maildir.  Without this, notmuch persistently complains
   about such files.
  
  Overall, sounds good and doing this automagically is nice. Superficially
  the code looks sensible, but I didn't really dig into it. A few nasty
  questions instead:
  
  What happens if you delete a non-email file? Does the entry stay in the
  database?
 
 Phooey.  I thought this worked, but you're right that it doesn't (I
 even wrote a test for this, but the test was based on a false
 assumption).  Non-email files do get returned by the directory
 iterator, so without any changes, notmuch new will notice that they're
 gone.  What I missed is that it then uses
 notmuch_database_find_message_by_filename to find the message and
 remove the filename, which won't work since there's no message to
 find.
 
 I'll have to think about this more.

Sorry about that...

This feature has considerable overlap with file/subdirectory exclusion,
most recently referred to in id:20120122113212.GA7084@X200. I like the
way your approach is automatic, but doing it manually with configurable
exclusions has certain explicitness to it, and altogether avoids the
problems here, don't you think? There apparently also are people who
wouldn't want notmuch to index some valid email files for one reason or
another.

I haven't thought this through, but what if the exclude/ignore feature
had both the option to specify explicit files/subdirs (patterns like
.gitignore?) that are ignored, and some sort of auto option you could
enable to ignore all non-email files without warnings? This would
obviously all happen in the cli.

That probably does not make your thinking any easier, I'm afraid... but
perhaps it provides another angle.


BR,
Jani.


 
  What happens if you replace a non-email file with an email file?
 
 It will not notice because notmuch new only inspects directory mtimes.
 This would require checking the mtimes of every non-email in the
 database on every notmuch new.
 
  Does it matter what happens above?
  
  These are corner cases, but what remains in TODO suggests that it would
  be difficult to debug and figure out if the above ever did happen to
  someone.
 
 Yes.  It's possible this needs to get a search syntax before it is
 acceptable for general use.
 
  BR,
  Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] lib: Save filenames for files detected as "not an email file" in the database.

2012-01-21 Thread Tomi Ollila
On Sat, 21 Jan 2012 13:13:07 -0500, Austin Clements  wrote:
> Quoth Tomi Ollila on Jan 21 at 11:48 am:
> > On Fri, 20 Jan 2012 17:00:27 -0500, Austin Clements  
> > wrot> > 
> > > -large so the detection could be potentially slow). 2. A "notmuch
> > > -search" syntax could be added to allow the user to find these files,
> > > -(and perhaps delete them or move them away as appropriate).
> > > +Add a "notmuch search" syntax to allow uses to find files recorded as
> > > +non-emails in the database (and perhaps delete them or move them away
> > > +as appropriate).
> > 
> > Could these messages be tagged with some fixed tag -- we already have
> > 'signed' and 'attachment' tag. maybe 'nonemail' (or something) could
> > be used for these messages ?
> 
> They aren't actually messages.  Messages have a lot of basic metadata
> that non-email files don't have, so I went with distinct types of
> documents, figuring that would be much less disruptive than having to
> deal with message objects that don't support most message methods.
> For example, if there were a tag (or any general way to query this),
> it's unclear what the output of
>  notmuch search --output=summary tag:nonemail
> would be.

yes, I started to think all these issues *after* sending that email.
bunch on extra if's to the code and so on; better think something
else...

> This isn't necessarily the right approach, but if non-emails *are*
> represented as messages, I'm not sure what to do with things like
> notmuch_message_get_message_id and notmuch_message_get_thread_id or
> how to maintain backwards compatibility for callers that don't expect
> queries to return non-emails.

so true.

Tomi


[PATCH] lib: Save filenames for files detected as "not an email file" in the database.

2012-01-21 Thread Austin Clements
Quoth Jani Nikula on Jan 22 at  1:00 am:
> On Fri, 20 Jan 2012 17:00:27 -0500, Austin Clements  
> wrote:
> > Later runs of "notmuch new" won't scan these files again and won't
> > print warnings.
> > 
> > Various programs (Dovecot, in my case) store indexes and caches and
> > such in the maildir.  Without this, notmuch persistently complains
> > about such files.
> 
> Overall, sounds good and doing this automagically is nice. Superficially
> the code looks sensible, but I didn't really dig into it. A few nasty
> questions instead:
> 
> What happens if you delete a non-email file? Does the entry stay in the
> database?

Phooey.  I thought this worked, but you're right that it doesn't (I
even wrote a test for this, but the test was based on a false
assumption).  Non-email files do get returned by the directory
iterator, so without any changes, notmuch new will notice that they're
gone.  What I missed is that it then uses
notmuch_database_find_message_by_filename to find the "message" and
remove the filename, which won't work since there's no message to
find.

I'll have to think about this more.

> What happens if you replace a non-email file with an email file?

It will not notice because notmuch new only inspects directory mtimes.
This would require checking the mtimes of every non-email in the
database on every notmuch new.

> Does it matter what happens above?
> 
> These are corner cases, but what remains in TODO suggests that it would
> be difficult to debug and figure out if the above ever did happen to
> someone.

Yes.  It's possible this needs to get a search syntax before it is
acceptable for general use.

> BR,
> Jani.


[PATCH] lib: Save filenames for files detected as "not an email file" in the database.

2012-01-21 Thread Jameson Graef Rollins
On Fri, 20 Jan 2012 17:00:27 -0500, Austin Clements  wrote:
> Later runs of "notmuch new" won't scan these files again and won't
> print warnings.

Nice.  +1.  LGTM.

jamie.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: 



[PATCH] lib: Save filenames for files detected as "not an email file" in the database.

2012-01-21 Thread Austin Clements
Quoth Tomi Ollila on Jan 21 at 11:48 am:
> On Fri, 20 Jan 2012 17:00:27 -0500, Austin Clements  
> wrote:
> > Later runs of "notmuch new" won't scan these files again and won't
> > print warnings.
> > 
> > Various programs (Dovecot, in my case) store indexes and caches and
> > such in the maildir.  Without this, notmuch persistently complains
> > about such files.
> > ---
> 
> LGTM...
> 
> > Every time I run notmuch new I get a slew of these warnings.  It was
> > starting to get on my nerves, so I implemented the solution suggested
> > by the TODO file.
> 
> [ ... ]
> 
> > -large so the detection could be potentially slow). 2. A "notmuch
> > -search" syntax could be added to allow the user to find these files,
> > -(and perhaps delete them or move them away as appropriate).
> > +Add a "notmuch search" syntax to allow uses to find files recorded as
> > +non-emails in the database (and perhaps delete them or move them away
> > +as appropriate).
> 
> Could these messages be tagged with some fixed tag -- we already have
> 'signed' and 'attachment' tag. maybe 'nonemail' (or something) could
> be used for these messages ?

They aren't actually messages.  Messages have a lot of basic metadata
that non-email files don't have, so I went with distinct types of
documents, figuring that would be much less disruptive than having to
deal with message objects that don't support most message methods.
For example, if there were a tag (or any general way to query this),
it's unclear what the output of
 notmuch search --output=summary tag:nonemail
would be.

This isn't necessarily the right approach, but if non-emails *are*
represented as messages, I'm not sure what to do with things like
notmuch_message_get_message_id and notmuch_message_get_thread_id or
how to maintain backwards compatibility for callers that don't expect
queries to return non-emails.


[PATCH] lib: Save filenames for files detected as "not an email file" in the database.

2012-01-21 Thread Tomi Ollila
On Fri, 20 Jan 2012 17:00:27 -0500, Austin Clements  wrote:
> Later runs of "notmuch new" won't scan these files again and won't
> print warnings.
> 
> Various programs (Dovecot, in my case) store indexes and caches and
> such in the maildir.  Without this, notmuch persistently complains
> about such files.
> ---

LGTM...

> Every time I run notmuch new I get a slew of these warnings.  It was
> starting to get on my nerves, so I implemented the solution suggested
> by the TODO file.

[ ... ]

> -large so the detection could be potentially slow). 2. A "notmuch
> -search" syntax could be added to allow the user to find these files,
> -(and perhaps delete them or move them away as appropriate).
> +Add a "notmuch search" syntax to allow uses to find files recorded as
> +non-emails in the database (and perhaps delete them or move them away
> +as appropriate).

Could these messages be tagged with some fixed tag -- we already have
'signed' and 'attachment' tag. maybe 'nonemail' (or something) could
be used for these messages ?

Tomi


Re: [PATCH] lib: Save filenames for files detected as not an email file in the database.

2012-01-21 Thread Tomi Ollila
On Fri, 20 Jan 2012 17:00:27 -0500, Austin Clements amdra...@mit.edu wrote:
 Later runs of notmuch new won't scan these files again and won't
 print warnings.
 
 Various programs (Dovecot, in my case) store indexes and caches and
 such in the maildir.  Without this, notmuch persistently complains
 about such files.
 ---

LGTM...

 Every time I run notmuch new I get a slew of these warnings.  It was
 starting to get on my nerves, so I implemented the solution suggested
 by the TODO file.

[ ... ]

 -large so the detection could be potentially slow). 2. A notmuch
 -search syntax could be added to allow the user to find these files,
 -(and perhaps delete them or move them away as appropriate).
 +Add a notmuch search syntax to allow uses to find files recorded as
 +non-emails in the database (and perhaps delete them or move them away
 +as appropriate).

Could these messages be tagged with some fixed tag -- we already have
'signed' and 'attachment' tag. maybe 'nonemail' (or something) could
be used for these messages ?

Tomi
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] lib: Save filenames for files detected as not an email file in the database.

2012-01-21 Thread Austin Clements
Quoth Tomi Ollila on Jan 21 at 11:48 am:
 On Fri, 20 Jan 2012 17:00:27 -0500, Austin Clements amdra...@mit.edu wrote:
  Later runs of notmuch new won't scan these files again and won't
  print warnings.
  
  Various programs (Dovecot, in my case) store indexes and caches and
  such in the maildir.  Without this, notmuch persistently complains
  about such files.
  ---
 
 LGTM...
 
  Every time I run notmuch new I get a slew of these warnings.  It was
  starting to get on my nerves, so I implemented the solution suggested
  by the TODO file.
 
 [ ... ]
 
  -large so the detection could be potentially slow). 2. A notmuch
  -search syntax could be added to allow the user to find these files,
  -(and perhaps delete them or move them away as appropriate).
  +Add a notmuch search syntax to allow uses to find files recorded as
  +non-emails in the database (and perhaps delete them or move them away
  +as appropriate).
 
 Could these messages be tagged with some fixed tag -- we already have
 'signed' and 'attachment' tag. maybe 'nonemail' (or something) could
 be used for these messages ?

They aren't actually messages.  Messages have a lot of basic metadata
that non-email files don't have, so I went with distinct types of
documents, figuring that would be much less disruptive than having to
deal with message objects that don't support most message methods.
For example, if there were a tag (or any general way to query this),
it's unclear what the output of
 notmuch search --output=summary tag:nonemail
would be.

This isn't necessarily the right approach, but if non-emails *are*
represented as messages, I'm not sure what to do with things like
notmuch_message_get_message_id and notmuch_message_get_thread_id or
how to maintain backwards compatibility for callers that don't expect
queries to return non-emails.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] lib: Save filenames for files detected as not an email file in the database.

2012-01-21 Thread Tomi Ollila
On Sat, 21 Jan 2012 13:13:07 -0500, Austin Clements amdra...@mit.edu wrote:
 Quoth Tomi Ollila on Jan 21 at 11:48 am:
  On Fri, 20 Jan 2012 17:00:27 -0500, Austin Clements amdra...@mit.edu 
  wrot  
   -large so the detection could be potentially slow). 2. A notmuch
   -search syntax could be added to allow the user to find these files,
   -(and perhaps delete them or move them away as appropriate).
   +Add a notmuch search syntax to allow uses to find files recorded as
   +non-emails in the database (and perhaps delete them or move them away
   +as appropriate).
  
  Could these messages be tagged with some fixed tag -- we already have
  'signed' and 'attachment' tag. maybe 'nonemail' (or something) could
  be used for these messages ?
 
 They aren't actually messages.  Messages have a lot of basic metadata
 that non-email files don't have, so I went with distinct types of
 documents, figuring that would be much less disruptive than having to
 deal with message objects that don't support most message methods.
 For example, if there were a tag (or any general way to query this),
 it's unclear what the output of
  notmuch search --output=summary tag:nonemail
 would be.

yes, I started to think all these issues *after* sending that email.
bunch on extra if's to the code and so on; better think something
else...

 This isn't necessarily the right approach, but if non-emails *are*
 represented as messages, I'm not sure what to do with things like
 notmuch_message_get_message_id and notmuch_message_get_thread_id or
 how to maintain backwards compatibility for callers that don't expect
 queries to return non-emails.

so true.

Tomi
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] lib: Save filenames for files detected as not an email file in the database.

2012-01-21 Thread Jameson Graef Rollins
On Fri, 20 Jan 2012 17:00:27 -0500, Austin Clements amdra...@mit.edu wrote:
 Later runs of notmuch new won't scan these files again and won't
 print warnings.

Nice.  +1.  LGTM.

jamie.


pgp5iKg7NwEt5.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] lib: Save filenames for files detected as not an email file in the database.

2012-01-21 Thread Jani Nikula
On Fri, 20 Jan 2012 17:00:27 -0500, Austin Clements amdra...@mit.edu wrote:
 Later runs of notmuch new won't scan these files again and won't
 print warnings.
 
 Various programs (Dovecot, in my case) store indexes and caches and
 such in the maildir.  Without this, notmuch persistently complains
 about such files.

Overall, sounds good and doing this automagically is nice. Superficially
the code looks sensible, but I didn't really dig into it. A few nasty
questions instead:

What happens if you delete a non-email file? Does the entry stay in the
database?

What happens if you replace a non-email file with an email file?

Does it matter what happens above?

These are corner cases, but what remains in TODO suggests that it would
be difficult to debug and figure out if the above ever did happen to
someone.


BR,
Jani.


 ---
 Every time I run notmuch new I get a slew of these warnings.  It was
 starting to get on my nerves, so I implemented the solution suggested
 by the TODO file.
 
  devel/TODO  |9 +++--
  lib/database.cc |   41 +
  test/new|   23 +++
  3 files changed, 67 insertions(+), 6 deletions(-)
 
 diff --git a/devel/TODO b/devel/TODO
 index 4dda6f4..b64a26e 100644
 --- a/devel/TODO
 +++ b/devel/TODO
 @@ -260,12 +260,9 @@ existing messages at the next database upgrade).
  Add support for the user to specify custom headers to be indexed (and
  re-index these for existing messages at the next database upgrade).
  
 -Save filenames for files detected as not an email file in the
 -database. This would allow for two things: 1. Optimizing notmuch new
 -to not have to look at these files again (since they are potentially
 -large so the detection could be potentially slow). 2. A notmuch
 -search syntax could be added to allow the user to find these files,
 -(and perhaps delete them or move them away as appropriate).
 +Add a notmuch search syntax to allow uses to find files recorded as
 +non-emails in the database (and perhaps delete them or move them away
 +as appropriate).
  
  Fix filesystem/notmuch-new race condition by not updating database
  mtime for a directory if it is the same as the current mtime.
 diff --git a/lib/database.cc b/lib/database.cc
 index 8103bd9..fd1ec6e 100644
 --- a/lib/database.cc
 +++ b/lib/database.cc
 @@ -1618,6 +1618,43 @@ _notmuch_database_link_message (notmuch_database_t 
 *notmuch,
  return NOTMUCH_STATUS_SUCCESS;
  }
  
 +static notmuch_status_t
 +_notmuch_database_add_nonemail (notmuch_database_t *notmuch,
 + const char *filename)
 +{
 +notmuch_status_t status = NOTMUCH_STATUS_SUCCESS;
 +void *local = talloc_new (notmuch);
 +char *term, *direntry;
 +Xapian::docid id;
 +
 +if (notmuch-mode == NOTMUCH_DATABASE_MODE_READ_ONLY)
 + INTERNAL_ERROR (Failure to ensure database is writable);
 +
 +Xapian::WritableDatabase *db =
 + static_cast Xapian::WritableDatabase * (notmuch-xapian_db);
 +
 +/* Create a document to record the non-email */
 +Xapian::Document nonemail;
 +term = talloc_asprintf (local, %s%s, _find_prefix (type), 
 nonemail);
 +nonemail.add_term (term, 0);
 +
 +status = _notmuch_database_filename_to_direntry (local, notmuch,
 +  filename, direntry);
 +if (status)
 + goto DONE;
 +term = talloc_asprintf (local, %s%s, _find_prefix (file-direntry),
 + direntry);
 +nonemail.add_term (term, 0);
 +
 +/* Add it to the database */
 +id = _notmuch_database_generate_doc_id (notmuch);
 +db-replace_document (id, nonemail);
 +
 +  DONE:
 +talloc_free (local);
 +return status;
 +}
 +
  notmuch_status_t
  notmuch_database_add_message (notmuch_database_t *notmuch,
 const char *filename,
 @@ -1673,6 +1710,10 @@ notmuch_database_add_message (notmuch_database_t 
 *notmuch,
   (subject == NULL || *subject == '\0') 
   (to == NULL || *to == '\0'))
   {
 + /* The file is not an email.  Record it so we don't
 +  * reconsider this file in the future, which prevents
 +  * potentially expensive scans and annoying warnings. */
 + _notmuch_database_add_nonemail (notmuch, filename);
   ret = NOTMUCH_STATUS_FILE_NOT_EMAIL;
   goto DONE;
   }
 diff --git a/test/new b/test/new
 index 49f390d..346d453 100755
 --- a/test/new
 +++ b/test/new
 @@ -153,4 +153,27 @@ rm -rf ${MAIL_DIR}/two
  output=$(NOTMUCH_NEW)
  test_expect_equal $output No new mail. Removed 3 messages.
  
 +
 +test_begin_subtest Skips non-email
 +PRE_COUNT=$(notmuch search '*' | wc -l)
 +echo I am not an email  ${MAIL_DIR}/nonemail
 +output=$(NOTMUCH_NEW 21 | sed -n '/^Note:/p;$p' | sed 's/\(file:\) .*/\1 
 XXX/')
 +test_expect_equal $output Note: Ignoring non-mail file: XXX
 +No new mail.
 +
 +test_begin_subtest Non-email files are not indexed
 +POST_COUNT=$(notmuch 

Re: [PATCH] lib: Save filenames for files detected as not an email file in the database.

2012-01-21 Thread Austin Clements
Quoth Jani Nikula on Jan 22 at  1:00 am:
 On Fri, 20 Jan 2012 17:00:27 -0500, Austin Clements amdra...@mit.edu wrote:
  Later runs of notmuch new won't scan these files again and won't
  print warnings.
  
  Various programs (Dovecot, in my case) store indexes and caches and
  such in the maildir.  Without this, notmuch persistently complains
  about such files.
 
 Overall, sounds good and doing this automagically is nice. Superficially
 the code looks sensible, but I didn't really dig into it. A few nasty
 questions instead:
 
 What happens if you delete a non-email file? Does the entry stay in the
 database?

Phooey.  I thought this worked, but you're right that it doesn't (I
even wrote a test for this, but the test was based on a false
assumption).  Non-email files do get returned by the directory
iterator, so without any changes, notmuch new will notice that they're
gone.  What I missed is that it then uses
notmuch_database_find_message_by_filename to find the message and
remove the filename, which won't work since there's no message to
find.

I'll have to think about this more.

 What happens if you replace a non-email file with an email file?

It will not notice because notmuch new only inspects directory mtimes.
This would require checking the mtimes of every non-email in the
database on every notmuch new.

 Does it matter what happens above?
 
 These are corner cases, but what remains in TODO suggests that it would
 be difficult to debug and figure out if the above ever did happen to
 someone.

Yes.  It's possible this needs to get a search syntax before it is
acceptable for general use.

 BR,
 Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] lib: Save filenames for files detected as "not an email file" in the database.

2012-01-20 Thread Austin Clements
Later runs of "notmuch new" won't scan these files again and won't
print warnings.

Various programs (Dovecot, in my case) store indexes and caches and
such in the maildir.  Without this, notmuch persistently complains
about such files.
---
Every time I run notmuch new I get a slew of these warnings.  It was
starting to get on my nerves, so I implemented the solution suggested
by the TODO file.

 devel/TODO  |9 +++--
 lib/database.cc |   41 +
 test/new|   23 +++
 3 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/devel/TODO b/devel/TODO
index 4dda6f4..b64a26e 100644
--- a/devel/TODO
+++ b/devel/TODO
@@ -260,12 +260,9 @@ existing messages at the next database upgrade).
 Add support for the user to specify custom headers to be indexed (and
 re-index these for existing messages at the next database upgrade).

-Save filenames for files detected as "not an email file" in the
-database. This would allow for two things: 1. Optimizing "notmuch new"
-to not have to look at these files again (since they are potentially
-large so the detection could be potentially slow). 2. A "notmuch
-search" syntax could be added to allow the user to find these files,
-(and perhaps delete them or move them away as appropriate).
+Add a "notmuch search" syntax to allow uses to find files recorded as
+non-emails in the database (and perhaps delete them or move them away
+as appropriate).

 Fix filesystem/notmuch-new race condition by not updating database
 mtime for a directory if it is the same as the current mtime.
diff --git a/lib/database.cc b/lib/database.cc
index 8103bd9..fd1ec6e 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1618,6 +1618,43 @@ _notmuch_database_link_message (notmuch_database_t 
*notmuch,
 return NOTMUCH_STATUS_SUCCESS;
 }

+static notmuch_status_t
+_notmuch_database_add_nonemail (notmuch_database_t *notmuch,
+   const char *filename)
+{
+notmuch_status_t status = NOTMUCH_STATUS_SUCCESS;
+void *local = talloc_new (notmuch);
+char *term, *direntry;
+Xapian::docid id;
+
+if (notmuch->mode == NOTMUCH_DATABASE_MODE_READ_ONLY)
+   INTERNAL_ERROR ("Failure to ensure database is writable");
+
+Xapian::WritableDatabase *db =
+   static_cast  (notmuch->xapian_db);
+
+/* Create a document to record the non-email */
+Xapian::Document nonemail;
+term = talloc_asprintf (local, "%s%s", _find_prefix ("type"), "nonemail");
+nonemail.add_term (term, 0);
+
+status = _notmuch_database_filename_to_direntry (local, notmuch,
+filename, );
+if (status)
+   goto DONE;
+term = talloc_asprintf (local, "%s%s", _find_prefix ("file-direntry"),
+   direntry);
+nonemail.add_term (term, 0);
+
+/* Add it to the database */
+id = _notmuch_database_generate_doc_id (notmuch);
+db->replace_document (id, nonemail);
+
+  DONE:
+talloc_free (local);
+return status;
+}
+
 notmuch_status_t
 notmuch_database_add_message (notmuch_database_t *notmuch,
  const char *filename,
@@ -1673,6 +1710,10 @@ notmuch_database_add_message (notmuch_database_t 
*notmuch,
(subject == NULL || *subject == '\0') &&
(to == NULL || *to == '\0'))
{
+   /* The file is not an email.  Record it so we don't
+* reconsider this file in the future, which prevents
+* potentially expensive scans and annoying warnings. */
+   _notmuch_database_add_nonemail (notmuch, filename);
ret = NOTMUCH_STATUS_FILE_NOT_EMAIL;
goto DONE;
}
diff --git a/test/new b/test/new
index 49f390d..346d453 100755
--- a/test/new
+++ b/test/new
@@ -153,4 +153,27 @@ rm -rf "${MAIL_DIR}"/two
 output=$(NOTMUCH_NEW)
 test_expect_equal "$output" "No new mail. Removed 3 messages."

+
+test_begin_subtest "Skips non-email"
+PRE_COUNT=$(notmuch search '*' | wc -l)
+echo "I am not an email" > "${MAIL_DIR}"/nonemail
+output=$(NOTMUCH_NEW 2>&1 | sed -n '/^Note:/p;$p' | sed 's/\(file:\) .*/\1 
XXX/')
+test_expect_equal "$output" "Note: Ignoring non-mail file: XXX
+No new mail."
+
+test_begin_subtest "Non-email files are not indexed"
+POST_COUNT=$(notmuch search '*' | wc -l)
+test_expect_equal "$PRE_COUNT" "$POST_COUNT"
+
+test_begin_subtest "Ignores non-email on second pass"
+touch "${MAIL_DIR}"
+output=$(NOTMUCH_NEW 2>&1 | sed -n '/^Note:/p;$p' | sed 's/\(file:\) .*/\1 
XXX/')
+test_expect_equal "$output" "No new mail."
+
+test_begin_subtest "Detects deletion of non-email"
+rm "${MAIL_DIR}"/nonemail
+output=$(NOTMUCH_NEW)
+test_expect_equal "$output" "No new mail. Removed 1 message."
+
+
 test_done
-- 
1.7.7.3



[PATCH] lib: Save filenames for files detected as not an email file in the database.

2012-01-20 Thread Austin Clements
Later runs of notmuch new won't scan these files again and won't
print warnings.

Various programs (Dovecot, in my case) store indexes and caches and
such in the maildir.  Without this, notmuch persistently complains
about such files.
---
Every time I run notmuch new I get a slew of these warnings.  It was
starting to get on my nerves, so I implemented the solution suggested
by the TODO file.

 devel/TODO  |9 +++--
 lib/database.cc |   41 +
 test/new|   23 +++
 3 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/devel/TODO b/devel/TODO
index 4dda6f4..b64a26e 100644
--- a/devel/TODO
+++ b/devel/TODO
@@ -260,12 +260,9 @@ existing messages at the next database upgrade).
 Add support for the user to specify custom headers to be indexed (and
 re-index these for existing messages at the next database upgrade).
 
-Save filenames for files detected as not an email file in the
-database. This would allow for two things: 1. Optimizing notmuch new
-to not have to look at these files again (since they are potentially
-large so the detection could be potentially slow). 2. A notmuch
-search syntax could be added to allow the user to find these files,
-(and perhaps delete them or move them away as appropriate).
+Add a notmuch search syntax to allow uses to find files recorded as
+non-emails in the database (and perhaps delete them or move them away
+as appropriate).
 
 Fix filesystem/notmuch-new race condition by not updating database
 mtime for a directory if it is the same as the current mtime.
diff --git a/lib/database.cc b/lib/database.cc
index 8103bd9..fd1ec6e 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1618,6 +1618,43 @@ _notmuch_database_link_message (notmuch_database_t 
*notmuch,
 return NOTMUCH_STATUS_SUCCESS;
 }
 
+static notmuch_status_t
+_notmuch_database_add_nonemail (notmuch_database_t *notmuch,
+   const char *filename)
+{
+notmuch_status_t status = NOTMUCH_STATUS_SUCCESS;
+void *local = talloc_new (notmuch);
+char *term, *direntry;
+Xapian::docid id;
+
+if (notmuch-mode == NOTMUCH_DATABASE_MODE_READ_ONLY)
+   INTERNAL_ERROR (Failure to ensure database is writable);
+
+Xapian::WritableDatabase *db =
+   static_cast Xapian::WritableDatabase * (notmuch-xapian_db);
+
+/* Create a document to record the non-email */
+Xapian::Document nonemail;
+term = talloc_asprintf (local, %s%s, _find_prefix (type), nonemail);
+nonemail.add_term (term, 0);
+
+status = _notmuch_database_filename_to_direntry (local, notmuch,
+filename, direntry);
+if (status)
+   goto DONE;
+term = talloc_asprintf (local, %s%s, _find_prefix (file-direntry),
+   direntry);
+nonemail.add_term (term, 0);
+
+/* Add it to the database */
+id = _notmuch_database_generate_doc_id (notmuch);
+db-replace_document (id, nonemail);
+
+  DONE:
+talloc_free (local);
+return status;
+}
+
 notmuch_status_t
 notmuch_database_add_message (notmuch_database_t *notmuch,
  const char *filename,
@@ -1673,6 +1710,10 @@ notmuch_database_add_message (notmuch_database_t 
*notmuch,
(subject == NULL || *subject == '\0') 
(to == NULL || *to == '\0'))
{
+   /* The file is not an email.  Record it so we don't
+* reconsider this file in the future, which prevents
+* potentially expensive scans and annoying warnings. */
+   _notmuch_database_add_nonemail (notmuch, filename);
ret = NOTMUCH_STATUS_FILE_NOT_EMAIL;
goto DONE;
}
diff --git a/test/new b/test/new
index 49f390d..346d453 100755
--- a/test/new
+++ b/test/new
@@ -153,4 +153,27 @@ rm -rf ${MAIL_DIR}/two
 output=$(NOTMUCH_NEW)
 test_expect_equal $output No new mail. Removed 3 messages.
 
+
+test_begin_subtest Skips non-email
+PRE_COUNT=$(notmuch search '*' | wc -l)
+echo I am not an email  ${MAIL_DIR}/nonemail
+output=$(NOTMUCH_NEW 21 | sed -n '/^Note:/p;$p' | sed 's/\(file:\) .*/\1 
XXX/')
+test_expect_equal $output Note: Ignoring non-mail file: XXX
+No new mail.
+
+test_begin_subtest Non-email files are not indexed
+POST_COUNT=$(notmuch search '*' | wc -l)
+test_expect_equal $PRE_COUNT $POST_COUNT
+
+test_begin_subtest Ignores non-email on second pass
+touch ${MAIL_DIR}
+output=$(NOTMUCH_NEW 21 | sed -n '/^Note:/p;$p' | sed 's/\(file:\) .*/\1 
XXX/')
+test_expect_equal $output No new mail.
+
+test_begin_subtest Detects deletion of non-email
+rm ${MAIL_DIR}/nonemail
+output=$(NOTMUCH_NEW)
+test_expect_equal $output No new mail. Removed 1 message.
+
+
 test_done
-- 
1.7.7.3

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch