[notmuch] Threading
On Tue, 15 Dec 2009 16:54:20 +0100, Marten Veldthuis wrote: > On Thu, 10 Dec 2009 13:30:13 -0800, Carl Worth wrote: > > But I still have a hard time justifying user operations to manipulate > > threading. The whole point of threading is to make it faster to process > > and read messages. But manual operations like joining and splitting > > threads seem like the user just doing more work, and that *after* having > > read the messages. So that seems mostly backwards to me. > > By the way, Outlook & Exchange suck (or at least some versions do), and > don't seem to generate In-Reply-To and References: headers. Just got a > mail which prompted me to write this mail. I'd really like to be able to > join messages in a case like this. It's actually worse than that. I was looking into why some of my threads weren't coalescing. Some of it seems to be a very difficult bug DB that doesn't use identical Message-ID's to refer to the parent bug mail. I don't know how that works at all. Sometimes it uses the same Message-ID, but sometimes it changes a number in the ID. However, this isn't the worst news, because I work with a lot of Exchange users, and I noticed that their mail was also refusing to thread. I was looking at the message bodies, and they led me to these links about mail processing. The problem identified: http://blog.postmaster.gr/2007/12/11/trying-to-make-use-of-outlooks-thread-index-header/ How to read it, or how Exchange goes its own way: http://blog.postmaster.gr/2007/12/23/more-fun-with-message-threading/ With a fairly loose understanding of how notmuch detects threads, and how much information it places in the Xapian database (only the msg-id?), I can't suggest much of the how. But I would like to propose that we consider handling the Exchange non-standard threading method as well as the RFC822 threading in the headers. Reactions? -Mark
[notmuch] Rather simple optimization for notmuch tag
On Wed, 23 Dec 2009 03:45:14 +, Olly Betts wrote: > Carl Worth writes: > > On Fri, 18 Dec 2009 00:49:00 -0700, Mark Anderson wrote: > > > I was updating my poll script that tags messages, and a common idiom is > > > to put > > > tag +mytag and not tag:mytag > > > > > > I don't know anything about efficiency, but for the simple single-tag > > > case, couldn't we imply the "and not tag:mytag" from the +mytag action > > > list for the tag command? > > > > On one level, it really shouldn't be a performance issue to tag messages > > that already have a particular tag. (And in fact, the recently proposed > > patches to fix Xapian defect 250 even address this I think.) > > Applying a filter up-front like this is likely to still help I think as it > avoids Xapian having to reverse-engineer this information internally. That's good to hear. > Actually, you could do this with multiple tags - you just need to build > a filter for documents which might be affected. > > So if you're adding tags a1 and a2, you want: AND_NOT (a1 AND a2) > since documents which already have tags a1 and a2 can be ignored. > > If you're removing d1 and d2, then the filter is: AND (d1 OR d2) > since documents have to be tagged d1 or d2 in order for the removal to > do anything. > > Handling a combination of removals and additions is trickier, but probably > possible, although the more tags you are dealing with, the less profitable > the filtering is likely to be (as the filter is likely to cull fewer > documents yet be more expensive to evaluate). But the transform is pretty simple, I think that any combination of additions and removals could be transformed according to the following formula. notmuch tag +a1 +a2 +a3 -d1 -d2 -d3 would transform to something like: and ( not(a1) or not(a2) or not(a3) or d1 or d2 or d3) There are certainly may be much more optimal ways to do it depending on the specific corpus of the database, considering if the tags a1 and a2 and a3 are usually added as one tag, or if the addition is done individually, because if I know that a3 implies a1 and a2, the first 3 terms could be combined to not(a1 and a2 and a3), or I could just exclude a3 tagged messages for nearly the same effect, with expected performance improvements. Unfortunately this requires that I know more about how the tags are used than I ever want notmuch to deal with. Perhaps a follow-on or parallel project with less emphasis on minimalism. This looks pretty good to me. Easy to implement and not likely to break things. I've been wondering about whether there should be a repository of mail added to the notmuch git so that we can start testing these kinds of features on a consistent body of mail. I doubt that I'll be the one to write this, since I don't have any time set aside for real coding, but if it takes long enough, I'll probably pick this one up eventually. -Mark
[notmuch] [PATCH] Add post-add and post-tag hooks
[Sorry, I seemed to manage to attach my reply to the wrong thread...] On Wed, Dec 23, 2009 at 07:57:21AM +0100, Tomas Carnecky wrote: > On 12/23/09 12:02 AM, Olly Betts wrote: >> Rather than a platform-specific check, it would be better to check if DT_DIR >> is defined. >> >> Beware that even on Linux (where the d_type field is present), it may always >> contain DT_UNKNOWN for some filesystems, so you really should check for that >> case and fall back to using stat() instead. > > Currently configure is a simple shell script and not some autoconf > magic. And I don't know how eager Carl is to use autoconf, scons, cmake > or similar. No autoconf magic required (or desirable here that I can see) - here's what I'm suggesting (untested as written, but Xapian's omega indexer uses an approach much like this): #ifdef DT_UNKNOWN /* If d_type is available and supported by the FS, avoid a call to stat. */ if (entries[i]->d_type == DT_UNKNOWN) { /* Fall back to calling stat. */ #endif { char pbuf[PATH_MAX]; snprintf(pbuf, PATH_MAX, "%s/%s", path, entries[i]->d_name); struct stat buf; if (stat(pbuf, ) == -1 || !S_ISDIR(buf.st_mode)) continue; } #ifdef DT_UNKNOWN } else if (entries[i]->d_type != DT_DIR) continue; #endif Cheers, Olly
[notmuch] [PATCH] Add post-add and post-tag hooks
On 12/23/09 12:02 AM, Olly Betts wrote: > Tomas Carnecky writes: >> #if defined(__sun__) >> ... sprintf, stat etc >> #else >> (void) path; >> return dirent->d_type == DT_DIR; >> #endif > > Rather than a platform-specific check, it would be better to check if DT_DIR > is defined. > > Beware that even on Linux (where the d_type field is present), it may always > contain DT_UNKNOWN for some filesystems, so you really should check for that > case and fall back to using stat() instead. Currently configure is a simple shell script and not some autoconf magic. And I don't know how eager Carl is to use autoconf, scons, cmake or similar. tom
[notmuch] Rather simple optimization for notmuch tag
Carl Worth writes: > On Fri, 18 Dec 2009 00:49:00 -0700, Mark Anderson wrote: > > I was updating my poll script that tags messages, and a common idiom is > > to put > > tag +mytag and not tag:mytag > > > > I don't know anything about efficiency, but for the simple single-tag > > case, couldn't we imply the "and not tag:mytag" from the +mytag action > > list for the tag command? > > On one level, it really shouldn't be a performance issue to tag messages > that already have a particular tag. (And in fact, the recently proposed > patches to fix Xapian defect 250 even address this I think.) Applying a filter up-front like this is likely to still help I think as it avoids Xapian having to reverse-engineer this information internally. > One potential snag with both ideas is that the "notmuch tag" > command-line as currently implemented allows for multiple tag additions > and removals with a single search. So the optimization here couldn't be > used unless there was just a single tag action. Actually, you could do this with multiple tags - you just need to build a filter for documents which might be affected. So if you're adding tags a1 and a2, you want: AND_NOT (a1 AND a2) since documents which already have tags a1 and a2 can be ignored. If you're removing d1 and d2, then the filter is: AND (d1 OR d2) since documents have to be tagged d1 or d2 in order for the removal to do anything. Handling a combination of removals and additions is trickier, but probably possible, although the more tags you are dealing with, the less profitable the filtering is likely to be (as the filter is likely to cull fewer documents yet be more expensive to evaluate). Cheers, Olly