Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-16 Thread Bdale Garbee
On Fri, 2009-12-04 at 10:35 -0800, Carl Worth wrote:

 But the above sounds like the List-Id header is unreliable enough to be
 useless. 

FWIW, that does not match my experience.

 Any reason not to just use something like
 to:notm...@notmuchmail to match messages sent to a list like this one?

I'd had much better luck matching List-Id than matching addresses in
recent years.  YMMV.

Bdale


___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-16 Thread Michael Alan Dorman
 I'd had much better luck matching List-Id than matching addresses in
 recent years.  YMMV.

As long as you're not CC:d, you're fine.  If you're CC:'d, well, Mailman
is more brain-dead than you could imagine.

Mike.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-05 Thread Carl Worth
On Sat, 05 Dec 2009 09:51:58 +0100, Marten Veldthuis mar...@veldthuis.com 
wrote:
 On Fri, 04 Dec 2009 16:39:50 -0800, Carl Worth cwo...@cworth.org wrote:
  But when viewing an actual message, I'm still planning on having notmuch
  just return an arbitrary filename from the list of filenames associated
  with that message. Does anyone see any problem with that? Can you think
  of a case where you'd really care about seeing one or the other of
  a particular duplicated message?
 
 As long as it's deterministic. But if you don't display the first
 filename received, couldn't you exploit this by spoofing message ids?

What it currently does is use the filename of the first file that
notmuch encounters. That's different than first received, but either
way, there's still a race condition here for active spoofing attempts.

And, yes, actual intentional collisions of message IDs is something I
hadn't given thought to yet. So thanks for bringing that up. It's
definitely a case where you'd want to know and see the difference.

So maybe what we really want to do is to display some full-context diff
of the message by default, and have notmuch learn about differences the
user isn't interested in seeing, (such as mailing-list footers or so).

That sounds workable and should make any spoofing attempt obvious to the
user.

-Carl



pgpNa2wqChbW7.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Carl Worth
On Fri, 04 Dec 2009 09:55:45 -0400, da...@tethera.net wrote:
 P.S. do people want to be CC'd on this list, or not?

We don't require subscription to the list, so I recommend CC, yes.

Plus, notmuch already handles duplicate mail just fine, (in that the
user only sees one copy at least). And I tag my mail differently when
one of my addresses appears on the CC list, so I definitely prefer that
people CC me when they want to call my specific attention to a message.

-Carl



pgpLVTXleZV0b.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Mikhail Gusarov

Twas brillig at 10:05:05 04.12.2009 UTC-08 when cwo...@cworth.org did gyre and 
gimble:

 CW Plus, notmuch already handles duplicate mail just fine, (in that the
 CW user only sees one copy at least). And I tag my mail differently when
 CW one of my addresses appears on the CC list, so I definitely prefer that
 CW people CC me when they want to call my specific attention to a message.

The only problem with Cc is that Mailman suppresses duplicate messages and hence
there is no List-Id: on message.

-- 
  http://fossarchy.blogspot.com/


pgp0ErfTPiPG0.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Carl Worth
On Sat, 05 Dec 2009 00:07:36 +0600, Mikhail Gusarov dotted...@dottedmag.net 
wrote:
 The only problem with Cc is that Mailman suppresses duplicate messages and 
 hence
 there is no List-Id: on message.

Hey, well notmuch doesn't even index the List-Id: header anyway. [*] ;-)

But the above sounds like the List-Id header is unreliable enough to be
useless. Any reason not to just use something like
to:notm...@notmuchmail to match messages sent to a list like this one?

I think mailman defaults to not allowing messages with the mailing-list
address implicit (such as in a Bcc) so it seems like matching the list
recipient will be more reliable than hoping the List-Id is always there.

-Carl

[*] Our TODO list does talk about supporting a configuration parameter
for indexing additional headers of interest.


pgpV8gX9vqhp9.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Michael Alan Dorman
 But the above sounds like the List-Id header is unreliable enough to
 be useless.

In my current .sieve setup, I have 93 entries for mailing lists.  87
of them use list-id[1].  3 use list-post.  1 uses 'mailing-list', but
looking at it, could be switched to list-id.  2 use x-mailing-list
(blasted vger.kernel.org).

None of my email gets misfiled, so it seems pretty darn reliable to
me. :)

Now, if you have an MTA that does duplicate suppression based on
message-id, you probably won't see the copy of a message that went to
the list if you're cc:'d on it because the direct copy (sans list-id
header) is likely to arrive first.

I would argue that that's a feature not a bug---the sender, at least,
hopes you will give it closer scrutiny because you were CC:'d.  They're
trying to bring it to your attention.

Besides, in notmuch, what's the difference going to be?  It'll still be
threaded the same, etc., but you'd be able to tell that this one came
to you rather than through the list, no?

(I'm waiting for Debian packages, lazy bastard that I am, so I'm
guessing on that)

 Any reason not to just use something like
 to:notm...@notmuchmail to match messages sent to a list like this one?

On the linux-kernel list, l-k often isn't in the to: field---or does
notmuch also index the cc: as to:?  If it does, this could work; if
not, FAIL.

Mike.


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Carl Worth
On Fri, 4 Dec 2009 14:09:46 -0500, Michael Alan Dorman 
mdor...@ironicdesign.com wrote:
 Now, if you have an MTA that does duplicate suppression based on
 message-id, you probably won't see the copy of a message that went to
 the list if you're cc:'d on it because the direct copy (sans list-id
 header) is likely to arrive first.
 
 I would argue that that's a feature not a bug---the sender, at least,
 hopes you will give it closer scrutiny because you were CC:'d.  They're
 trying to bring it to your attention.

Sure, giving it closer scrutiny is good. But if I expect a search like:

tag:lkml

to match all of my mail that came through the mailing list, but it
actually *misses* mail where the sender wanted me to give extra
scrutiny, then that's a big failure.

 Besides, in notmuch, what's the difference going to be?  It'll still be
 threaded the same, etc., but you'd be able to tell that this one came
 to you rather than through the list, no?

The difference is whether the message is found in a search, (see above).

 (I'm waiting for Debian packages, lazy bastard that I am, so I'm
 guessing on that)

Yeah, I'll get to that (real soon now, I promise.)

 On the linux-kernel list, l-k often isn't in the to: field---or does
 notmuch also index the cc: as to:?  If it does, this could work; if
 not, FAIL.

Yes. In notmuch, all recipient fields, (even Bcc: if a mail happens to
hit your mail store with that intact), all get indexed to a single to
prefix. My rationale is that when reading a message it's often very
useful to see whether I was addresses specifically or just CC'ed. But
when _searching_ for a message, it's too fragile to have to guess
whether the recipient was on the To: or CC: header (and too painful to
always type (to:m...@example.com or cc:m...@example.com).

-Carl


pgphwLErbTMiN.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Carl Worth
On Fri, 4 Dec 2009 14:09:46 -0500, Michael Alan Dorman 
mdor...@ironicdesign.com wrote:
 Besides, in notmuch, what's the difference going to be?  It'll still be
 threaded the same, etc., but you'd be able to tell that this one came
 to you rather than through the list, no?

There's one other point I should make here while talking about duplicate
messages, (as determined by identical Message ID).

Currently notmuch just indexes the first version of any given message it
sees, and simply ignores anything else it sees in the future.

We're planning to change it to at least save each of the filenames for
messages with multiple files. That way if some duplicates are deleted,
then notmuch will still be able to find one of the others.

Also, we could make notmuch index duplicate messages and add any
additional terms found to the document for the message. Currently, that
wouldn't make a big difference since notmuch is only indexing the body
and a few specific headers, (From, Subject, To, Cc, Bcc, Messsage-ID,
In-Reply-To, References).

So any differences there should be quite minor (a [LIST] prefix in
subject? an extra footer in the boday?), under the assumption that no
mail files will ever exist with the same message ID but disparate
content.

Now, we have a TODO item to allow for indexing additional headers,
(either by default or by user configuration). Once we start doing that,
it probably will make sense to at least index the duplicates.

But when viewing an actual message, I'm still planning on having notmuch
just return an arbitrary filename from the list of filenames associated
with that message. Does anyone see any problem with that? Can you think
of a case where you'd really care about seeing one or the other of
a particular duplicated message?

-Carl


pgpwMRp1Se2vY.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Mikhail Gusarov

Twas brillig at 16:39:50 04.12.2009 UTC-08 when cwo...@cworth.org did gyre and 
gimble:

 CW But when viewing an actual message, I'm still planning on having
 CW notmuch just return an arbitrary filename from the list of
 CW filenames associated with that message. Does anyone see any problem
 CW with that? Can you think of a case where you'd really care about
 CW seeing one or the other of a particular duplicated message?

There might be different Reply-To fields.

So I'd just return bigger dup, as it probably contains more information
:)

-- 
  http://fossarchy.blogspot.com/


pgpVhyQD5Jv8p.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-03 Thread Carl Worth
On Thu,  3 Dec 2009 03:15:26 +0600, Mikhail Gusarov dotted...@dottedmag.net 
wrote:
 In order to handle message renames the following changes were deemed
 necessary:

Hi Mikhail,

Thanks for contributing this patch (twice!). I think if I had gotten to
it sooner, I probably would have committed it. But now...

 * Mtime check on individual files was disabled. As files may be moved around
 without changing their mtime, it's necessary to parse them even if they appear
 old in case old message was moved. mtime check on directories was kept as 
 moving
 files changes mtime of directory.

That sounds pretty harsh. I'm having to do a lot of stat() calls already
when new mail arrives. Having to also parse the message ID out of
(roughly, for me) 1 files every time sounds pretty rough. Fortunately...

 Note that after applying this patch notmuch still does not handle copying 
 files
 (which is harmless, database will point to the last copy of message found 
 during
 'notmuch new') and deleting files (which is more serious, as dangling entries
 will show up in searches).

Today, Keith and designed an interface that will support addition,
copying, rename, and deletion of files. And it will be faster than the
existing code with its mtime heuristics.

The complete design is on Keith's laptop right now, and hopefully he'll
appear soon with an implementation. Basically, there are only two new
functions needed in the library (if we got the design right):

notmuch_directory_t
notmuch_database_read_directory (notmuch_database_t *database,
 const char *path);

notmuch_status_t
notmuch_message_remove_filename (notmuch_message_t *message,
 const char *filename);

The notmuch_directory_t object will be used in place of the current
notmuch_database_get_timestamp call in notmuch-new.c. In addition to the
mtime that we currently read from the database, it will provide a list
of all directories and files (along with message IDs) known to the
database for a particular path. So notmuch-new can then quickly compare
the results of scandir with this notmuch_directory_t object and then
call notmuch_database_add_message and notmuch_message_remove_filename as
appropriate.

I'm leaving out details about how to ensure we don't delete a message
too soon if it's actually a rename that will be seen as an added file
later in the scan. Obviously the implementation will need to deal with
that, (either with an additional library call for I'm done adding
files, go ahead and delete dangling messages, or by postponing all
calls to remove_filename until later).

Oh, and one idea is to do deletion by dropping all indexed terms, but
saving the message ID and any tags in the database. That's small and is
the only precious data, so might be worth holding onto just in case.

Anyway, I think we'll see code for that soon, so I'm not planning to
commit the offered patch. But people really needing renames might want
to use it for now, (and live with any performance implications it
causes).

-Carl


pgpK07jCVYjC6.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch