[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-17 Thread Mikhail Gusarov

Twas brillig at 16:51:17 16.12.2009 UTC-07 when bdale at gag.com did gyre and 
gimble:

 >> But the above sounds like the List-Id header is unreliable enough to
 >> be useless.

 BG> FWIW, that does not match my experience.

Yeah. This mail just arrived to my "main" folder instead of "notmuch"
one, as you kept me in CC and hence Mailman did not send the copy with
List-Id to me.

Please read the whole thread.

-- 
  http://fossarchy.blogspot.com/
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: 



[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-16 Thread Michael Alan Dorman
> I'd had much better luck matching List-Id than matching addresses in
> recent years.  YMMV.

As long as you're not CC:d, you're fine.  If you're CC:'d, well, Mailman
is more brain-dead than you could imagine.

Mike.


[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-16 Thread Bdale Garbee
On Thu, 2009-12-17 at 06:01 +0600, Mikhail Gusarov wrote:
> Twas brillig at 16:51:17 16.12.2009 UTC-07 when bdale at gag.com did gyre and 
> gimble:
> 
>  >> But the above sounds like the List-Id header is unreliable enough to
>  >> be useless.
> 
>  BG> FWIW, that does not match my experience.
> 
> Yeah. This mail just arrived to my "main" folder instead of "notmuch"
> one, as you kept me in CC and hence Mailman did not send the copy with
> List-Id to me.
> 
> Please read the whole thread.

I did.  I guess I've just been lucky enough to mostly participate in
lists run with other software than Mailman or whose admins didn't leave
this default behavior in place...  [sigh]

I will, very unhappily, concede the point.

Bdale




[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-16 Thread Bdale Garbee
On Fri, 2009-12-04 at 10:35 -0800, Carl Worth wrote:

> But the above sounds like the List-Id header is unreliable enough to be
> useless. 

FWIW, that does not match my experience.

> Any reason not to just use something like
> to:notmuch at notmuchmail to match messages sent to a list like this one?

I'd had much better luck matching List-Id than matching addresses in
recent years.  YMMV.

Bdale




Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-16 Thread Bdale Garbee
On Fri, 2009-12-04 at 10:35 -0800, Carl Worth wrote:

 But the above sounds like the List-Id header is unreliable enough to be
 useless. 

FWIW, that does not match my experience.

 Any reason not to just use something like
 to:notm...@notmuchmail to match messages sent to a list like this one?

I'd had much better luck matching List-Id than matching addresses in
recent years.  YMMV.

Bdale


___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-16 Thread Michael Alan Dorman
 I'd had much better luck matching List-Id than matching addresses in
 recent years.  YMMV.

As long as you're not CC:d, you're fine.  If you're CC:'d, well, Mailman
is more brain-dead than you could imagine.

Mike.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-05 Thread Marten Veldthuis
On Fri, 04 Dec 2009 16:39:50 -0800, Carl Worth  wrote:
> But when viewing an actual message, I'm still planning on having notmuch
> just return an arbitrary filename from the list of filenames associated
> with that message. Does anyone see any problem with that? Can you think
> of a case where you'd really care about seeing one or the other of
> a particular duplicated message?

As long as it's deterministic. But if you don't display the first
filename received, couldn't you exploit this by spoofing message ids?

-- 
- Marten


[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-05 Thread Mikhail Gusarov

Twas brillig at 16:39:50 04.12.2009 UTC-08 when cworth at cworth.org did gyre 
and gimble:

 CW> But when viewing an actual message, I'm still planning on having
 CW> notmuch just return an arbitrary filename from the list of
 CW> filenames associated with that message. Does anyone see any problem
 CW> with that? Can you think of a case where you'd really care about
 CW> seeing one or the other of a particular duplicated message?

There might be different Reply-To fields.

So I'd just return bigger dup, as it probably contains more information
:)

-- 
  http://fossarchy.blogspot.com/
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: 



[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-05 Thread Mikhail Gusarov

Twas brillig at 10:35:27 04.12.2009 UTC-08 when cworth at cworth.org did gyre 
and gimble:

 >> The only problem with Cc is that Mailman suppresses duplicate
 >> messages and hence there is no List-Id: on message.

 CW> But the above sounds like the List-Id header is unreliable enough
 CW> to be useless.  Any reason not to just use something like
 CW> to:notmuch at notmuchmail to match messages sent to a list like this
 CW> one?

Automated processing. I'd go crazy to put all mailing lists' addresses
to .procmailrc instead of simple sorter in sed. But it seems it's the
only reliable way.

 CW> I think mailman defaults to not allowing messages with the
 CW> mailing-list address implicit (such as in a Bcc) so it seems like
 CW> matching the list recipient will be more reliable than hoping the
 CW> List-Id is always there.

Yep. Unfortunately.

-- 
  http://fossarchy.blogspot.com/
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: 



[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-05 Thread Mikhail Gusarov

Twas brillig at 10:05:05 04.12.2009 UTC-08 when cworth at cworth.org did gyre 
and gimble:

 CW> Plus, notmuch already handles duplicate mail just fine, (in that the
 CW> user only sees one copy at least). And I tag my mail differently when
 CW> one of my addresses appears on the CC list, so I definitely prefer that
 CW> people CC me when they want to call my specific attention to a message.

The only problem with Cc is that Mailman suppresses duplicate messages and hence
there is no List-Id: on message.

-- 
  http://fossarchy.blogspot.com/
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: 



Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-05 Thread Carl Worth
On Sat, 05 Dec 2009 09:51:58 +0100, Marten Veldthuis mar...@veldthuis.com 
wrote:
 On Fri, 04 Dec 2009 16:39:50 -0800, Carl Worth cwo...@cworth.org wrote:
  But when viewing an actual message, I'm still planning on having notmuch
  just return an arbitrary filename from the list of filenames associated
  with that message. Does anyone see any problem with that? Can you think
  of a case where you'd really care about seeing one or the other of
  a particular duplicated message?
 
 As long as it's deterministic. But if you don't display the first
 filename received, couldn't you exploit this by spoofing message ids?

What it currently does is use the filename of the first file that
notmuch encounters. That's different than first received, but either
way, there's still a race condition here for active spoofing attempts.

And, yes, actual intentional collisions of message IDs is something I
hadn't given thought to yet. So thanks for bringing that up. It's
definitely a case where you'd want to know and see the difference.

So maybe what we really want to do is to display some full-context diff
of the message by default, and have notmuch learn about differences the
user isn't interested in seeing, (such as mailing-list footers or so).

That sounds workable and should make any spoofing attempt obvious to the
user.

-Carl



pgpNa2wqChbW7.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Carl Worth
On Fri, 4 Dec 2009 14:09:46 -0500, Michael Alan Dorman  wrote:
> Besides, in notmuch, what's the difference going to be?  It'll still be
> threaded the same, etc., but you'd be able to tell that this one came
> to you rather than through the list, no?

There's one other point I should make here while talking about duplicate
messages, (as determined by identical Message ID).

Currently notmuch just indexes the first version of any given message it
sees, and simply ignores anything else it sees in the future.

We're planning to change it to at least save each of the filenames for
messages with multiple files. That way if some duplicates are deleted,
then notmuch will still be able to find one of the others.

Also, we could make notmuch index duplicate messages and add any
additional terms found to the document for the message. Currently, that
wouldn't make a big difference since notmuch is only indexing the body
and a few specific headers, (From, Subject, To, Cc, Bcc, Messsage-ID,
In-Reply-To, References).

So any differences there should be quite minor (a "[LIST]" prefix in
subject? an extra footer in the boday?), under the assumption that no
mail files will ever exist with the same message ID but disparate
content.

Now, we have a TODO item to allow for indexing additional headers,
(either by default or by user configuration). Once we start doing that,
it probably will make sense to at least index the duplicates.

But when viewing an actual message, I'm still planning on having notmuch
just return an arbitrary filename from the list of filenames associated
with that message. Does anyone see any problem with that? Can you think
of a case where you'd really care about seeing one or the other of
a particular duplicated message?

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Carl Worth
On Fri, 4 Dec 2009 14:09:46 -0500, Michael Alan Dorman  wrote:
> Now, if you have an MTA that does duplicate suppression based on
> message-id, you probably won't see the copy of a message that went to
> the list if you're cc:'d on it because the direct copy (sans list-id
> header) is likely to arrive first.
> 
> I would argue that that's a feature not a bug---the sender, at least,
> hopes you will give it closer scrutiny because you were CC:'d.  They're
> trying to bring it to your attention.

Sure, giving it closer scrutiny is good. But if I expect a search like:

tag:lkml

to match all of my mail that came through the mailing list, but it
actually *misses* mail where the sender wanted me to give extra
scrutiny, then that's a big failure.

> Besides, in notmuch, what's the difference going to be?  It'll still be
> threaded the same, etc., but you'd be able to tell that this one came
> to you rather than through the list, no?

The difference is whether the message is found in a search, (see above).

> (I'm waiting for Debian packages, lazy bastard that I am, so I'm
> guessing on that)

Yeah, I'll get to that (real soon now, I promise.)

> On the linux-kernel list, l-k often isn't in the to: field---or does
> notmuch also index the cc: as to:?  If it does, this could work; if
> not, FAIL.

Yes. In notmuch, all recipient fields, (even Bcc: if a mail happens to
hit your mail store with that intact), all get indexed to a single "to"
prefix. My rationale is that when reading a message it's often very
useful to see whether I was addresses specifically or just CC'ed. But
when _searching_ for a message, it's too fragile to have to guess
whether the recipient was on the To: or CC: header (and too painful to
always type (to:me at example.com or cc:me at example.com).

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Michael Alan Dorman
On Sat, 05 Dec 2009 00:55:20 +0600
Mikhail Gusarov  wrote:

> 
> Twas brillig at 13:52:20 04.12.2009 UTC-05 when
> mdorman at ironicdesign.com did gyre and gimble:
> 
>  MAD> Err, this makes no sense.  How can Mailman have any knowledge
>  MAD> of, and therefore "do anything" to any message that came by way
>  MAD> of a CC?
> 
> for each subscriber:
>   if subscriber.email in message.cc:
>  continue
>   ...
>   # delivery

I stand corrected---it seems like a gigantic misfeature to me, so
much so that I checked and apparently that is exactly how Mailman
works in its default configuration.

My apologies for suggesting you didn't know what you were talking
about.  I made the mistake of assuming sane software.

Mike.
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: 



[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Michael Alan Dorman
> But the above sounds like the List-Id header is unreliable enough to
> be useless.

In my current .sieve setup, I have 93 entries for mailing lists.  87
of them use list-id[1].  3 use list-post.  1 uses 'mailing-list', but
looking at it, could be switched to list-id.  2 use x-mailing-list
(blasted vger.kernel.org).

None of my email gets misfiled, so it seems pretty darn reliable to
me. :)

Now, if you have an MTA that does duplicate suppression based on
message-id, you probably won't see the copy of a message that went to
the list if you're cc:'d on it because the direct copy (sans list-id
header) is likely to arrive first.

I would argue that that's a feature not a bug---the sender, at least,
hopes you will give it closer scrutiny because you were CC:'d.  They're
trying to bring it to your attention.

Besides, in notmuch, what's the difference going to be?  It'll still be
threaded the same, etc., but you'd be able to tell that this one came
to you rather than through the list, no?

(I'm waiting for Debian packages, lazy bastard that I am, so I'm
guessing on that)

> Any reason not to just use something like
> to:notmuch at notmuchmail to match messages sent to a list like this one?

On the linux-kernel list, l-k often isn't in the to: field---or does
notmuch also index the cc: as to:?  If it does, this could work; if
not, FAIL.

Mike.
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: 



[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Michael Alan Dorman
On Sat, 05 Dec 2009 00:07:36 +0600
Mikhail Gusarov  wrote:

> The only problem with Cc is that Mailman suppresses duplicate
> messages and hence there is no List-Id: on message.

Err, this makes no sense.  How can Mailman have any knowledge of, and
therefore "do anything" to any message that came by way of a CC?

Now, your mail transfer agent might do duplicate suppression, and if
the direct email reaches you before the one that went through the
mailing list, you won't have a copy that includes the list-id header,
but that's an issue on your end, not with the mailing list software.

Mike.
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: 



[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Carl Worth
On Sat, 05 Dec 2009 00:07:36 +0600, Mikhail Gusarov  wrote:
> The only problem with Cc is that Mailman suppresses duplicate messages and 
> hence
> there is no List-Id: on message.

Hey, well notmuch doesn't even index the List-Id: header anyway. [*] ;-)

But the above sounds like the List-Id header is unreliable enough to be
useless. Any reason not to just use something like
to:notmuch at notmuchmail to match messages sent to a list like this one?

I think mailman defaults to not allowing messages with the mailing-list
address implicit (such as in a Bcc) so it seems like matching the list
recipient will be more reliable than hoping the List-Id is always there.

-Carl

[*] Our TODO list does talk about supporting a configuration parameter
for indexing additional headers of interest.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Carl Worth
On Fri, 04 Dec 2009 09:55:45 -0400, david at tethera.net wrote:
> P.S. do people want to be CC'd on this list, or not?

We don't require subscription to the list, so I recommend CC, yes.

Plus, notmuch already handles duplicate mail just fine, (in that the
user only sees one copy at least). And I tag my mail differently when
one of my addresses appears on the CC list, so I definitely prefer that
people CC me when they want to call my specific attention to a message.

-Carl

-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread da...@tethera.net

At Thu, 03 Dec 2009 16:45:22 -0800,

> Anyway, I think we'll see code for that soon, so I'm not planning to
> commit the offered patch. But people really needing renames might want
> to use it for now, (and live with any performance implications it
> causes).

I could live with the performance issues, but it seems that it re-tags
every "Processed" file (renamed or not) as inbox.  This brings about
20k messages back into my inbox, which is a bit unusable.  The problem
seems to be that notmuch_database_add_message returns
NOTMUCH_STATUS_SUCCESS whether or not a new message was really added.
I don't know if there is an easy fix for this, or if it is worth
pursuing, given that the patch won't be committed.

d

P.S. do people want to be CC'd on this list, or not?




Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Carl Worth
On Fri, 04 Dec 2009 09:55:45 -0400, da...@tethera.net wrote:
 P.S. do people want to be CC'd on this list, or not?

We don't require subscription to the list, so I recommend CC, yes.

Plus, notmuch already handles duplicate mail just fine, (in that the
user only sees one copy at least). And I tag my mail differently when
one of my addresses appears on the CC list, so I definitely prefer that
people CC me when they want to call my specific attention to a message.

-Carl



pgpLVTXleZV0b.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Mikhail Gusarov

Twas brillig at 10:05:05 04.12.2009 UTC-08 when cwo...@cworth.org did gyre and 
gimble:

 CW Plus, notmuch already handles duplicate mail just fine, (in that the
 CW user only sees one copy at least). And I tag my mail differently when
 CW one of my addresses appears on the CC list, so I definitely prefer that
 CW people CC me when they want to call my specific attention to a message.

The only problem with Cc is that Mailman suppresses duplicate messages and hence
there is no List-Id: on message.

-- 
  http://fossarchy.blogspot.com/


pgp0ErfTPiPG0.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Carl Worth
On Sat, 05 Dec 2009 00:07:36 +0600, Mikhail Gusarov dotted...@dottedmag.net 
wrote:
 The only problem with Cc is that Mailman suppresses duplicate messages and 
 hence
 there is no List-Id: on message.

Hey, well notmuch doesn't even index the List-Id: header anyway. [*] ;-)

But the above sounds like the List-Id header is unreliable enough to be
useless. Any reason not to just use something like
to:notm...@notmuchmail to match messages sent to a list like this one?

I think mailman defaults to not allowing messages with the mailing-list
address implicit (such as in a Bcc) so it seems like matching the list
recipient will be more reliable than hoping the List-Id is always there.

-Carl

[*] Our TODO list does talk about supporting a configuration parameter
for indexing additional headers of interest.


pgpV8gX9vqhp9.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Michael Alan Dorman
 But the above sounds like the List-Id header is unreliable enough to
 be useless.

In my current .sieve setup, I have 93 entries for mailing lists.  87
of them use list-id[1].  3 use list-post.  1 uses 'mailing-list', but
looking at it, could be switched to list-id.  2 use x-mailing-list
(blasted vger.kernel.org).

None of my email gets misfiled, so it seems pretty darn reliable to
me. :)

Now, if you have an MTA that does duplicate suppression based on
message-id, you probably won't see the copy of a message that went to
the list if you're cc:'d on it because the direct copy (sans list-id
header) is likely to arrive first.

I would argue that that's a feature not a bug---the sender, at least,
hopes you will give it closer scrutiny because you were CC:'d.  They're
trying to bring it to your attention.

Besides, in notmuch, what's the difference going to be?  It'll still be
threaded the same, etc., but you'd be able to tell that this one came
to you rather than through the list, no?

(I'm waiting for Debian packages, lazy bastard that I am, so I'm
guessing on that)

 Any reason not to just use something like
 to:notm...@notmuchmail to match messages sent to a list like this one?

On the linux-kernel list, l-k often isn't in the to: field---or does
notmuch also index the cc: as to:?  If it does, this could work; if
not, FAIL.

Mike.


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Carl Worth
On Fri, 4 Dec 2009 14:09:46 -0500, Michael Alan Dorman 
mdor...@ironicdesign.com wrote:
 Now, if you have an MTA that does duplicate suppression based on
 message-id, you probably won't see the copy of a message that went to
 the list if you're cc:'d on it because the direct copy (sans list-id
 header) is likely to arrive first.
 
 I would argue that that's a feature not a bug---the sender, at least,
 hopes you will give it closer scrutiny because you were CC:'d.  They're
 trying to bring it to your attention.

Sure, giving it closer scrutiny is good. But if I expect a search like:

tag:lkml

to match all of my mail that came through the mailing list, but it
actually *misses* mail where the sender wanted me to give extra
scrutiny, then that's a big failure.

 Besides, in notmuch, what's the difference going to be?  It'll still be
 threaded the same, etc., but you'd be able to tell that this one came
 to you rather than through the list, no?

The difference is whether the message is found in a search, (see above).

 (I'm waiting for Debian packages, lazy bastard that I am, so I'm
 guessing on that)

Yeah, I'll get to that (real soon now, I promise.)

 On the linux-kernel list, l-k often isn't in the to: field---or does
 notmuch also index the cc: as to:?  If it does, this could work; if
 not, FAIL.

Yes. In notmuch, all recipient fields, (even Bcc: if a mail happens to
hit your mail store with that intact), all get indexed to a single to
prefix. My rationale is that when reading a message it's often very
useful to see whether I was addresses specifically or just CC'ed. But
when _searching_ for a message, it's too fragile to have to guess
whether the recipient was on the To: or CC: header (and too painful to
always type (to:m...@example.com or cc:m...@example.com).

-Carl


pgphwLErbTMiN.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Carl Worth
On Fri, 4 Dec 2009 14:09:46 -0500, Michael Alan Dorman 
mdor...@ironicdesign.com wrote:
 Besides, in notmuch, what's the difference going to be?  It'll still be
 threaded the same, etc., but you'd be able to tell that this one came
 to you rather than through the list, no?

There's one other point I should make here while talking about duplicate
messages, (as determined by identical Message ID).

Currently notmuch just indexes the first version of any given message it
sees, and simply ignores anything else it sees in the future.

We're planning to change it to at least save each of the filenames for
messages with multiple files. That way if some duplicates are deleted,
then notmuch will still be able to find one of the others.

Also, we could make notmuch index duplicate messages and add any
additional terms found to the document for the message. Currently, that
wouldn't make a big difference since notmuch is only indexing the body
and a few specific headers, (From, Subject, To, Cc, Bcc, Messsage-ID,
In-Reply-To, References).

So any differences there should be quite minor (a [LIST] prefix in
subject? an extra footer in the boday?), under the assumption that no
mail files will ever exist with the same message ID but disparate
content.

Now, we have a TODO item to allow for indexing additional headers,
(either by default or by user configuration). Once we start doing that,
it probably will make sense to at least index the duplicates.

But when viewing an actual message, I'm still planning on having notmuch
just return an arbitrary filename from the list of filenames associated
with that message. Does anyone see any problem with that? Can you think
of a case where you'd really care about seeing one or the other of
a particular duplicated message?

-Carl


pgpwMRp1Se2vY.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-04 Thread Mikhail Gusarov

Twas brillig at 16:39:50 04.12.2009 UTC-08 when cwo...@cworth.org did gyre and 
gimble:

 CW But when viewing an actual message, I'm still planning on having
 CW notmuch just return an arbitrary filename from the list of
 CW filenames associated with that message. Does anyone see any problem
 CW with that? Can you think of a case where you'd really care about
 CW seeing one or the other of a particular duplicated message?

There might be different Reply-To fields.

So I'd just return bigger dup, as it probably contains more information
:)

-- 
  http://fossarchy.blogspot.com/


pgpVhyQD5Jv8p.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-03 Thread Carl Worth
On Thu,  3 Dec 2009 03:15:26 +0600, Mikhail Gusarov  wrote:
> In order to handle message renames the following changes were deemed
> necessary:

Hi Mikhail,

Thanks for contributing this patch (twice!). I think if I had gotten to
it sooner, I probably would have committed it. But now...

> * Mtime check on individual files was disabled. As files may be moved around
> without changing their mtime, it's necessary to parse them even if they appear
> old in case old message was moved. mtime check on directories was kept as 
> moving
> files changes mtime of directory.

That sounds pretty harsh. I'm having to do a lot of stat() calls already
when new mail arrives. Having to also parse the message ID out of
(roughly, for me) 1 files every time sounds pretty rough. Fortunately...

> Note that after applying this patch notmuch still does not handle copying 
> files
> (which is harmless, database will point to the last copy of message found 
> during
> 'notmuch new') and deleting files (which is more serious, as dangling entries
> will show up in searches).

Today, Keith and designed an interface that will support addition,
copying, rename, and deletion of files. And it will be faster than the
existing code with its mtime heuristics.

The complete design is on Keith's laptop right now, and hopefully he'll
appear soon with an implementation. Basically, there are only two new
functions needed in the library (if we got the design right):

notmuch_directory_t
notmuch_database_read_directory (notmuch_database_t *database,
 const char *path);

notmuch_status_t
notmuch_message_remove_filename (notmuch_message_t *message,
 const char *filename);

The notmuch_directory_t object will be used in place of the current
notmuch_database_get_timestamp call in notmuch-new.c. In addition to the
mtime that we currently read from the database, it will provide a list
of all directories and files (along with message IDs) known to the
database for a particular path. So notmuch-new can then quickly compare
the results of scandir with this notmuch_directory_t object and then
call notmuch_database_add_message and notmuch_message_remove_filename as
appropriate.

I'm leaving out details about how to ensure we don't delete a message
too soon if it's actually a rename that will be seen as an added file
later in the scan. Obviously the implementation will need to deal with
that, (either with an additional library call for "I'm done adding
files, go ahead and delete dangling messages", or by postponing all
calls to remove_filename until later).

Oh, and one idea is to do deletion by dropping all indexed terms, but
saving the message ID and any tags in the database. That's small and is
the only precious data, so might be worth holding onto "just in case".

Anyway, I think we'll see code for that soon, so I'm not planning to
commit the offered patch. But people really needing renames might want
to use it for now, (and live with any performance implications it
causes).

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-03 Thread Mikhail Gusarov
In order to handle message renames the following changes were deemed necessary:

* Mtime check on individual files was disabled. As files may be moved around
without changing their mtime, it's necessary to parse them even if they appear
old in case old message was moved. mtime check on directories was kept as moving
files changes mtime of directory.

* If message being parsed is already found in database under different path,
then this message is considered to be moved, path is updated in database and
this file does not undergo further processing.

Note that after applying this patch notmuch still does not handle copying files
(which is harmless, database will point to the last copy of message found during
'notmuch new') and deleting files (which is more serious, as dangling entries
will show up in searches).

Signed-off-by: Mikhail Gusarov 
---
 lib/database.cc |   32 ++-
 notmuch-new.c   |  116 ++
 2 files changed, 78 insertions(+), 70 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 23ddd4a..45d8fc7 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -993,19 +993,31 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
_notmuch_message_set_filename (message, filename);
_notmuch_message_add_term (message, "type", "mail");
-   } else {
-   ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
-   goto DONE;
-   }

-   ret = _notmuch_database_link_message (notmuch, message, message_file);
-   if (ret)
-   goto DONE;
+   ret = _notmuch_database_link_message (notmuch, message, 
message_file);
+   if (ret)
+   goto DONE;

-   date = notmuch_message_file_get_header (message_file, "date");
-   _notmuch_message_set_date (message, date);
+   date = notmuch_message_file_get_header (message_file, "date");
+   _notmuch_message_set_date (message, date);

-   _notmuch_message_index_file (message, filename);
+   _notmuch_message_index_file (message, filename);
+   } else {
+   const char *old_filename = notmuch_message_get_filename (message);
+   if (strcmp (old_filename, filename) == 0) {
+   /* We have already seen it */
+   goto DONE;
+   } else {
+   if (access (old_filename, R_OK) == 0) {
+   /* old_filename still exists, we've got a duplicate */
+   ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+   goto DONE;
+   } else {
+   /* Message file has been moved/renamed */
+   _notmuch_message_set_filename (message, filename);
+   }
+   }
+   }

_notmuch_message_sync (message);
 } catch (const Xapian::Error ) {
diff --git a/notmuch-new.c b/notmuch-new.c
index 9d20616..d595fc4 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -217,66 +217,62 @@ add_files_recursive (notmuch_database_t *notmuch,
}

if (S_ISREG (st->st_mode)) {
-   /* If the file hasn't been modified since the last
-* add_files, then we need not look at it. */
-   if (path_dbtime == 0 || st->st_mtime > path_dbtime) {
-   state->processed_files++;
-
-   if (state->verbose) {
-   if (state->output_is_a_tty)
-   printf("\r\033[K");
-
-   printf ("%i/%i: %s",
-   state->processed_files,
-   state->total_files,
-   next);
-
-   putchar((state->output_is_a_tty) ? '\r' : '\n');
-   fflush (stdout);
-   }
-
-   status = notmuch_database_add_message (notmuch, next, );
-   switch (status) {
-   /* success */
-   case NOTMUCH_STATUS_SUCCESS:
-   state->added_messages++;
-   tag_inbox_and_unread (message);
-   break;
-   /* Non-fatal issues (go on to next file) */
-   case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID:
-   /* Stay silent on this one. */
-   break;
-   case NOTMUCH_STATUS_FILE_NOT_EMAIL:
-   fprintf (stderr, "Note: Ignoring non-mail file: %s\n",
-next);
-   break;
-   /* Fatal issues. Don't process anymore. */
-   case NOTMUCH_STATUS_READONLY_DATABASE:
-   case NOTMUCH_STATUS_XAPIAN_EXCEPTION:
-   case NOTMUCH_STATUS_OUT_OF_MEMORY:
-   fprintf (stderr, "Error: %s. Halting processing.\n",
-notmuch_status_to_string (status));
-   ret = status;
- 

Re: [notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-03 Thread Carl Worth
On Thu,  3 Dec 2009 03:15:26 +0600, Mikhail Gusarov dotted...@dottedmag.net 
wrote:
 In order to handle message renames the following changes were deemed
 necessary:

Hi Mikhail,

Thanks for contributing this patch (twice!). I think if I had gotten to
it sooner, I probably would have committed it. But now...

 * Mtime check on individual files was disabled. As files may be moved around
 without changing their mtime, it's necessary to parse them even if they appear
 old in case old message was moved. mtime check on directories was kept as 
 moving
 files changes mtime of directory.

That sounds pretty harsh. I'm having to do a lot of stat() calls already
when new mail arrives. Having to also parse the message ID out of
(roughly, for me) 1 files every time sounds pretty rough. Fortunately...

 Note that after applying this patch notmuch still does not handle copying 
 files
 (which is harmless, database will point to the last copy of message found 
 during
 'notmuch new') and deleting files (which is more serious, as dangling entries
 will show up in searches).

Today, Keith and designed an interface that will support addition,
copying, rename, and deletion of files. And it will be faster than the
existing code with its mtime heuristics.

The complete design is on Keith's laptop right now, and hopefully he'll
appear soon with an implementation. Basically, there are only two new
functions needed in the library (if we got the design right):

notmuch_directory_t
notmuch_database_read_directory (notmuch_database_t *database,
 const char *path);

notmuch_status_t
notmuch_message_remove_filename (notmuch_message_t *message,
 const char *filename);

The notmuch_directory_t object will be used in place of the current
notmuch_database_get_timestamp call in notmuch-new.c. In addition to the
mtime that we currently read from the database, it will provide a list
of all directories and files (along with message IDs) known to the
database for a particular path. So notmuch-new can then quickly compare
the results of scandir with this notmuch_directory_t object and then
call notmuch_database_add_message and notmuch_message_remove_filename as
appropriate.

I'm leaving out details about how to ensure we don't delete a message
too soon if it's actually a rename that will be seen as an added file
later in the scan. Obviously the implementation will need to deal with
that, (either with an additional library call for I'm done adding
files, go ahead and delete dangling messages, or by postponing all
calls to remove_filename until later).

Oh, and one idea is to do deletion by dropping all indexed terms, but
saving the message ID and any tags in the database. That's small and is
the only precious data, so might be worth holding onto just in case.

Anyway, I think we'll see code for that soon, so I'm not planning to
commit the offered patch. But people really needing renames might want
to use it for now, (and live with any performance implications it
causes).

-Carl


pgpK07jCVYjC6.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH (rebased)] Handle message renames in mail spool

2009-12-02 Thread Mikhail Gusarov
In order to handle message renames the following changes were deemed necessary:

* Mtime check on individual files was disabled. As files may be moved around
without changing their mtime, it's necessary to parse them even if they appear
old in case old message was moved. mtime check on directories was kept as moving
files changes mtime of directory.

* If message being parsed is already found in database under different path,
then this message is considered to be moved, path is updated in database and
this file does not undergo further processing.

Note that after applying this patch notmuch still does not handle copying files
(which is harmless, database will point to the last copy of message found during
'notmuch new') and deleting files (which is more serious, as dangling entries
will show up in searches).

Signed-off-by: Mikhail Gusarov dotted...@dottedmag.net
---
 lib/database.cc |   32 ++-
 notmuch-new.c   |  116 ++
 2 files changed, 78 insertions(+), 70 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 23ddd4a..45d8fc7 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -993,19 +993,31 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
_notmuch_message_set_filename (message, filename);
_notmuch_message_add_term (message, type, mail);
-   } else {
-   ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
-   goto DONE;
-   }
 
-   ret = _notmuch_database_link_message (notmuch, message, message_file);
-   if (ret)
-   goto DONE;
+   ret = _notmuch_database_link_message (notmuch, message, 
message_file);
+   if (ret)
+   goto DONE;
 
-   date = notmuch_message_file_get_header (message_file, date);
-   _notmuch_message_set_date (message, date);
+   date = notmuch_message_file_get_header (message_file, date);
+   _notmuch_message_set_date (message, date);
 
-   _notmuch_message_index_file (message, filename);
+   _notmuch_message_index_file (message, filename);
+   } else {
+   const char *old_filename = notmuch_message_get_filename (message);
+   if (strcmp (old_filename, filename) == 0) {
+   /* We have already seen it */
+   goto DONE;
+   } else {
+   if (access (old_filename, R_OK) == 0) {
+   /* old_filename still exists, we've got a duplicate */
+   ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+   goto DONE;
+   } else {
+   /* Message file has been moved/renamed */
+   _notmuch_message_set_filename (message, filename);
+   }
+   }
+   }
 
_notmuch_message_sync (message);
 } catch (const Xapian::Error error) {
diff --git a/notmuch-new.c b/notmuch-new.c
index 9d20616..d595fc4 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -217,66 +217,62 @@ add_files_recursive (notmuch_database_t *notmuch,
}
 
if (S_ISREG (st-st_mode)) {
-   /* If the file hasn't been modified since the last
-* add_files, then we need not look at it. */
-   if (path_dbtime == 0 || st-st_mtime  path_dbtime) {
-   state-processed_files++;
-
-   if (state-verbose) {
-   if (state-output_is_a_tty)
-   printf(\r\033[K);
-
-   printf (%i/%i: %s,
-   state-processed_files,
-   state-total_files,
-   next);
-
-   putchar((state-output_is_a_tty) ? '\r' : '\n');
-   fflush (stdout);
-   }
-
-   status = notmuch_database_add_message (notmuch, next, message);
-   switch (status) {
-   /* success */
-   case NOTMUCH_STATUS_SUCCESS:
-   state-added_messages++;
-   tag_inbox_and_unread (message);
-   break;
-   /* Non-fatal issues (go on to next file) */
-   case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID:
-   /* Stay silent on this one. */
-   break;
-   case NOTMUCH_STATUS_FILE_NOT_EMAIL:
-   fprintf (stderr, Note: Ignoring non-mail file: %s\n,
-next);
-   break;
-   /* Fatal issues. Don't process anymore. */
-   case NOTMUCH_STATUS_READONLY_DATABASE:
-   case NOTMUCH_STATUS_XAPIAN_EXCEPTION:
-   case NOTMUCH_STATUS_OUT_OF_MEMORY:
-   fprintf (stderr, Error: %s. Halting processing.\n,
-notmuch_status_to_string (status));
-   ret = status;
-