[notmuch] Idea for storing tags
also sprach Carl Worth [2010.01.14.1432 +1300]: > Yes. This approach requires some external means of synchronizing the > tags from one system to another. > > I don't understand what it would mean to have the mailstore and the > database out of synch here. This approach doesn't have the tags in the > mailstore by definition, right? You might have marked a message 'read' on one machine and if the two get out of sync on another machine, you might have the same message unread there. > > How about using pseudo-mails stored in Maildir and synchronised by > > IMAP? E.g. every folder could have a subfolder .TAGS and if we find > > a way to smartly pair messages between parent and subfolder, we'd > > have a tag store alongside the mailstore it refers to, but without > > the danger of leakage, and without having to rewrite messages. > ... > > Anyway, the idea is out now. Thoughts? > > There are a couple of problems that I don't see addressed at all with > this approach. The first is that there's not a one-to-one mapping > between messages and files in the mail store. (I'm CCed on a lot of list > mail meaning that I have multiple files in my mail store for a single > message.) Shouldn't this just be solved? I've had formail+procmail delete my duplicates for 10+ years, and while I don't like the fact that I usually get the CC before the list mail, and thus cannot filter on Delivered-To, I have never looked back. > Second, the only reason I would be interested in synchronizing mail > between two systems is so that I could manipulate the tag data in > multiple places, (that is, remove the "unread" tag whether on my > network-disconnected laptop or via web-mail when away from my > laptop). Using imap for synchronizing a file of tags within the mail > store gives you no mechanism for doing any sort of conflict resolution, > right? (Which I think in almost all cases is going to be quite trivial > if there's a chance for a program to resolve it.) I have not thought about this, but you are right. IMAP does not really allow for conflict resolution, which may well be *the* reason why you cannot update existing messages. > [*] Though, I think a plain-text file with tags managed with > something like git (and perhaps a custom merger) could save a lot > of work. Or perhaps a plain-text journal of tag manipulations on > either end that could be replayed on the other. Git is good at conflict resolution if run interactively, but [0] still makes me question whether it can ever take the place of IMAP. However, Asheesh Laroia, who has floated the idea of Git-for-mail at DebConf8 already, has some ideas and hopefully will soon reply to my mail [0], which I just bounced. 0. http://notmuchmail.org/pipermail/notmuch/2010/001114.html -- martin | http://madduck.net/ | http://two.sentenc.es/ apt-get source --compile gentoo spamtraps: madduck.bogus at madduck.net -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: Digital signature (see http://martin-krafft.net/gpg/) URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100114/6719fd61/attachment.pgp>
[notmuch] indexing mail?
On Thu, 14 Jan 2010 18:13:53 +0100, Arvid wrote: > On Thu, 14 Jan 2010 09:38:00 +0100, Arvid Picciani wrote: > > > on the first run (when no .notmuch is there yet), it finds some > > messages, but doesn't index them either. Yuk! I logged-in via Gmail's web interface and found that I have some new messages which are not being picked by Notmuch. > the offending commit is 2c4555f1a56602ff1dd55a63699810522ba4d91e > > from readdir (3): > > "Currently, only some file systems (among them: Btrfs, ext2, ext3, > and ext4) have full support returning the file >type in d_type. All applications must properly handle a return > of DT_UNKNOWN." I am using XFS, which always returns DT_UNKNOWN. Taking into account that there is a good deal of people using filesystems other than the ones you mention, and that other non-linux filesystems may also return DT_UNKNOWN, in my opinion there should be a fall-back. I will try to post a patch Anytime Soon?. Also, I have the feeling that the "d_type" field from "struct dirent" may not be available in some OSes because it is a BSD extension. Cheers, -- Adrian Perez de Castro Igalia - Free Software Engineering -- next part -- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100114/c182949a/attachment.pgp>
[notmuch] indexing mail?
On Thu, 14 Jan 2010 09:38:00 +0100, Arvid Picciani wrote: > on the first run (when no .notmuch is there yet), it finds some > messages, but doesn't index them either. the offending commit is 2c4555f1a56602ff1dd55a63699810522ba4d91e from readdir (3): "Currently, only some file systems (among them: Btrfs, ext2, ext3, and ext4) have full support returning the file type in d_type. All applications must properly handle a return of DT_UNKNOWN." thanks "kanru" for helping on irc.
[notmuch] Notmuch performance problems on OSX
Actually, significant performance problems. Ho ho ho. (sorry) I've installed the latest notmuch from Git at this time of writing, along with Xapian from SVN head. However, just tagging a single thread with only one message seems to take too long: $ time notmuch tag +dissertation thread:7dc536441e6deade4256a46d46451221 real0m0.812s user0m0.022s sys 0m0.037s And tagging all my messages is really horrible: $ time notmuch tag +foobar tag:inbox real0m5.076s user0m3.688s sys 0m0.105s Here is what my notmuch binary links with: $ otool -L /usr/local/bin/notmuch /usr/local/bin/notmuch: /usr/local/Cellar/gmime/2.4.0/lib/libgmime-2.4.2.dylib (compatibility version 7.0.0, current version 7.0.0) /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.3) /usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0) /usr/local/Cellar/glib/2.20.5/lib/libgobject-2.0.0.dylib (compatibility version 2001.0.0, current version 2001.5.0) /usr/local/Cellar/glib/2.20.5/lib/libglib-2.0.0.dylib (compatibility version 2001.0.0, current version 2001.5.0) /usr/local/Cellar/gettext/0.17/lib/libintl.8.dylib (compatibility version 9.0.0, current version 9.2.0) /usr/local/Cellar/xapian-svn/HEAD/lib/libxapian-1.1.3.dylib (compatibility version 4.0.0, current version 4.0.0) /usr/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version 7.4.0) /usr/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 111.1.4) That xapian-svn was built from svn HEAD right now, so I'm assuming it contains the #250 fix (http://trac.xapian.org/changeset/13808) Any ideas? -- Oliver Charles / aCiD2
[notmuch] Notmuch performance problems on OSX
Hi Oliver, welcome to notmuch! On Thu, 14 Jan 2010 15:30:48 +, Oliver Charles wrote: > I've installed the latest notmuch from Git at this time of writing, > along with Xapian from SVN head. However, just tagging a single thread > with only one message seems to take too long: > > $ time notmuch tag +dissertation thread:7dc536441e6deade4256a46d46451221 > > real 0m0.812s > user 0m0.022s > sys 0m0.037s Things work quite a bit faster than that on my machine: $ time notmuch tag +foo id:5641883d1001140730l22832715ld6bdc95c9938d314 at mail.gmail.com real0m0.024s user0m0.012s sys 0m0.004s But that could just be system differences. > And tagging all my messages is really horrible: > > $ time notmuch tag +foobar tag:inbox > > real 0m5.076s > user 0m3.688s > sys 0m0.105s For this operation, I can't really compare. How many messages are you tagging? Here's that operation for me with 525 messages in my inbox: $ time notmuch tag +foobar tag:inbox real0m1.551s user0m1.504s sys 0m0.016s > That xapian-svn was built from svn HEAD right now, so I'm assuming it > contains the #250 fix (http://trac.xapian.org/changeset/13808) Which I think means that things could have been even *much* slower before. ;-) The Xapian defect #250 was just one, initial (and obvious) performance problem. [Though, as I mentioned in a previous thread, if you're using a Xapian flint database, (look for .notmuch/xapian/iamflint), then you won't get the benefit of the Xapian fix until you rebuild your notmuch database from scratch with a current notmuch.] Once you've verified that you've got the #250 fix functional, there could still be lots of performance bugs. And it would be time to start profiling. Perhaps the "notmuch daemon" idea (which we've proposed earlier for other reasons) could help reduce overhead from reading the database and writing it back out again. So that might be one avenue to explore for fixing things. I have no idea what OS X does, but Linux keeps my notmuch database in its buffer cache so I can do these operations without even touching "disk" (which is actually an SSD anyway, which also helps). I just tried, and was able to get the single-message tag operation to be 3 times slower by dropping the cache: $ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches " $ time notmuch tag +foo id:5641883d1001140730l22832715ld6bdc95c9938d314 at mail.gmail.com real0m0.062s user0m0.000s sys 0m0.020s But again, whatever the performance problem might be, the first step would be to examine some profiles. (And I'm clueless, myself as to what profiling tools might be available for OS X.) -Carl -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100114/f398ab58/attachment.pgp>
[notmuch] Thoughts on notmuch and Lua
On Thu, 14 Jan 2010 10:47:13 +0200, Ali Polatel wrote: > Before trying to implement anything I decided to send a mail to the list > to ask people's opinion. Hi Ali, welcome to notmuch! I appreciate you soliciting opinions, but I hope that my answer won't discourage you. By all means, please feel free to experiment! > What's the problem? > === > Notmuch isn't very configurable. I'll grant that. And as can be seen in TODO and in code comments, we definitely want to fix that. > 1. Configuration file: > The configuration file can be a Lua script that allows more dynamic > configuration. Here's an example: > > # notmuch configuration file: > config = {} > config.dbpath = "/path/to/maildir" > config.exclude = function (maildir) > return not string.match(maildir, ".*Trash.*") > end > ... That doesn't look very compelling to me. I'd much rather have: [database] path=/home/cworth/mail maildir_exclude=.*Trash.* with the exact same functionality. Granted, having a full programming language in the configuration file makes thing much more dynamic, but it also makes it much harder for the user to read, edit, and ensure the syntax is correct. > 2. Hooks: > This is a feature I really miss having switched from sup. > There can be many hooks, a hook that formats search output, > a hook that is called before adding messages to the database which may > be used to add initial tags depending on headers etc. I understand that some people really like their hooks. They let users invent all kinds of interesting, custom functionality. But I think hooks also have problems. Sometimes the most interesting functionality has to be pieced together by every user going to a wiki page and finding the "standard" hooks. I'd much rather avoid that by getting the most useful functionality into the program in the first place. Hooks also impose a particular amount of maintenance burden on the software. And they are often implemented in a way that makes them very hard to be discovered. I wrote a message to the sup mailing list describing some of these issues. The context there was a patch I wrote adding a configuration option, (and the sup maintainer preferring it be added as a patch instead): id:1254417826-sup-6584 at yoom.home.cworth.org Subject: Re: [sup-talk] [PATCH] Add new :crypto_default configuration option. I did find out later that the sup hooks were more self-documenting than I had understood. (There was a sup command-line option that printed documentation for all available hooks.) Something like that is definitely a requirement for providing hooks. So I'm not entirely opposed to the idea of adding hooks to notmuch, but I'll definitely need to be convinced that any particular functionality can't be better integrated without the hook. > Why Lua? > > Lua has many advantages over other scripting languages when it comes to > integration with a C program. It has a very clean and easy C API, the > overhead of running Lua scripts is not noticable among other things. I've definitely heard lots of good things about "lua embedability". So if we do decide to provide hooks, then lua would seem like a logical option to look at first. I've never looked at it closely myself though. -Carl -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100114/de7d8874/attachment.pgp>
[notmuch] indexing mail?
On Thu, 14 Jan 2010 18:38:54 +0100, Adrian Perez de Castro wrote: > > the offending commit is 2c4555f1a56602ff1dd55a63699810522ba4d91e > > > > from readdir (3): > > > > "Currently, only some file systems (among them: Btrfs, ext2, ext3, > > and ext4) have full support returning the file > >type in d_type. All applications must properly handle a return > > of DT_UNKNOWN." Yes. The broken code was my mistake. I clearly didn't read the above warning closely enough. Sorry about that! > I am using XFS, which always returns DT_UNKNOWN. Taking into account that > there is a good deal of people using filesystems other than the ones you > mention, and that other non-linux filesystems may also return DT_UNKNOWN, > in my opinion there should be a fall-back. I will try to post a patch > Anytime Soon?. We definitely want the fallback. I can attempt to code it, but I don't have ready access to an afflicted filesystem, so I'd need help testing anyway. I'd love to see a patch for this bug soon. Be sure to CC me when the patch is sent and that will help me commit it sooner. > Also, I have the feeling that the "d_type" field from "struct dirent" may > not be available in some OSes because it is a BSD extension. I'm generally quite bad at determining whether functionality I'm using in my software is non-portable. As proven in this case, even when the man page tells me something is not portable I don't always notice, (and often, the man pages aren't even that useful). Beyond that, even if something is *known* to be theoretically non-portable, it can be a waste of time to code compatibility paths that nobody will be running in practice. So I've basically gotten to the point where I just code for what works on my system, (not out of disregard for what other people run---just that it's impossible for me to know what subset of functionality is actually relevant). Then, at the same time, I'm quite happy to accept code to improve the portability when people note that things are broken on other systems. See the git history and email archives for examples of how we fixed strndup and getline portability problems. I know that "wait for people to notice it's broken" isn't the nicest thing we could do with our code. But I don't really know a much better way. I'm happy to entertain suggestions here. -Carl -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100114/de069ea5/attachment.pgp>
[notmuch] Idea for storing tags
On Thu, 14 Jan 2010 21:04:21 +1300, martin f krafft wrote: > You might have marked a message 'read' on one machine and if the two > get out of sync on another machine, you might have the same message > unread there. That's a different issue though. With two databases there's clearly the opportunity for the two databases to be out of synch. But you talked about the database being out of synch with respect to the mailstore. And that's something I just don't understand, (given the assumption that all tags are stored in the database---which was the explicit description of the case of interest). > Shouldn't this just be solved? I've had formail+procmail delete my > duplicates for 10+ years, and while I don't like the fact that > I usually get the CC before the list mail, and thus cannot filter on > Delivered-To, I have never looked back. Notmuch has access to all the information it needs to allow you to delete the CC version once the list mail arrives. So you could do notmuch-based deletion now and avoid losing the Delivered-To header if you want. > > [*] Though, I think a plain-text file with tags managed with > > something like git (and perhaps a custom merger) could save a lot > > of work. Or perhaps a plain-text journal of tag manipulations on > > either end that could be replayed on the other. > > Git is good at conflict resolution if run interactively, but [0] > still makes me question whether it can ever take the place of IMAP. > However, Asheesh Laroia, who has floated the idea of Git-for-mail at > DebConf8 already, has some ideas and hopefully will soon reply to my > mail [0], which I just bounced. > > 0. http://notmuchmail.org/pipermail/notmuch/2010/001114.html Using git for mail is an interesting idea, but not what I was actually proposing here. I think that synchronizing the mail store and synchronizing the tags information are tasks that have different requirements, and for which we may well want different tools. So I was talking about using imap (or rsync, or what have you) for copying the mailtstore, and then having something with a bit more domain-specific awareness for doing the synchronization of the tags data. -Carl -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100114/46b3bb9e/attachment.pgp>
[notmuch] [PATCH] Use libgcrypt for hashing.
On Fri, 08 Jan 2010 15:43:52 -0500, micah anderson wrote: > Its good that this is not a burden to maintain for the notmuch project, > even better that Mikhail, the libsha1 maintainer, is currently active in > this project and has volunteered to maintain the in-tree copy. > > However, the problem that has been raised is about the code-maintenance > burden that distributions face. In fact, this is not an unique problem > to notmuch, if it was it wouldn't be such a big deal. The reality is > that the more projects which cargo-cult around 'convenience copies' of > code, the more of a burden is placed on the distributors. > > In some ways, the notmuch project and the role of distributors are at > cross-purposes on this issue, each side has an argument that makes sense > From their individual perspectives. Well, I think it's important for notmuch to ease the burden on the distribution as well. That's just a matter of being a good citizen. If notmuch were including code that existed as a library package in Debian, say. Then that would definitely be problematic, and notmuch should be fixed to link with the library. We could get to that point if someone wanted to package libsha1, say. > > What might make more sense is an option to compile against an existing > > library (if present) but not to introduce an error in the build if the > > library is not present, (in which case just build the builtin libsha1.c > > code). > > This makes the most sense, and resolves the issue in a way that both > sides of the issue benefit! I'd be glad to see a patch that does that. -Carl -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100114/6b278409/attachment.pgp>
[notmuch] Threading
On Fri, 8 Jan 2010 16:12:38 +1300, martin f krafft wrote: > Reading is one thing. Information storage and organisation is > another. After a message is delivered (and read) to my mailbox, it's > really mine and I can (and should be able) to affix it and integrate > it into my organisational scheme any way I want, don't you think? A fair point. I don't see this being something I'm going to spend any time implementing. I just wouldn't use the functionality myself. But I would be happy to integrate patches if someone came up with some. -Carl -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100114/26d0ef3c/attachment.pgp>
[notmuch] Thoughts on notmuch and Lua
Before trying to implement anything I decided to send a mail to the list to ask people's opinion. What's the problem? === Notmuch isn't very configurable. How can Lua integration solve this? === Here are initial thoughts on how to integrate Lua with notmuch. Any comments appreciated. 1. Configuration file: The configuration file can be a Lua script that allows more dynamic configuration. Here's an example: # notmuch configuration file: config = {} config.dbpath = "/path/to/maildir" config.exclude = function (maildir) return not string.match(maildir, ".*Trash.*") end ... 2. Hooks: This is a feature I really miss having switched from sup. There can be many hooks, a hook that formats search output, a hook that is called before adding messages to the database which may be used to add initial tags depending on headers etc. Why Lua? Lua has many advantages over other scripting languages when it comes to integration with a C program. It has a very clean and easy C API, the overhead of running Lua scripts is not noticable among other things. -- Regards, Ali Polatel -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100114/67708c7c/attachment-0001.pgp>
[notmuch] indexing encrypted messages (was: OpenPGP support)
On 2010-01-08, James Westby wrote: > That would leave an open question over whether future notmuch show > invocations would return the plaintext or ciphertext. If it is the > latter then it requires decrypting every time you want to view it, but > it does mean that there is less information leakage (you could find out > whether an encrypted message contained a particular term, but not read > the whole message directly). You can actually use the term position information to reconstruct the original message text pretty well. It misses capitalisation, punctuation, and distinctions between whitespace, but is generally enough to allow the message to be understood: http://article.gmane.org/gmane.comp.search.xapian.general/2187 Cheers, Olly
[notmuch] indexing mail?
Hi, how do you add new mails to the index? manual says "notmuch new" should be enough, but it simply says "No new mail." on the first run (when no .notmuch is there yet), it finds some messages, but doesn't index them either. $ notmuch search tag:inbox $ $ notmuch search s $ -- Arvid Asgaard Technologies
[notmuch] [RFC/PATCH v2] Add search-files command
Jameson Rollins yazm??: > On Wed, Jan 13, 2010 at 03:17:41PM +0200, Ali Polatel wrote: > > This command can be used to integrate notmuch with other MUAs as a > > searching client. The idea is simple, a simple script could get > > search-terms as argument and create a "virtual" maildir which has > > symbolic links to files output by search-files command. This is similar > > to nmzmail. > > Hi, Ali. I was also recently asking about a way to output just the > file names of message resulting from searches. This is an important > feature for handling deleting and moving in mail clients as well. I > believe that Carl said this would be easier once he applied the JSON > output patches that are in the queue right now. Hopefully we'll see > those soon. > > Personally I think the right way to implement this from a UI > perspective would be to just have an output filter for the 'search' > subcommand, something like: > > notmuch search --output=filename ... > > If output formatting was well enough supported one could even imagine > getting rid of the 'show' subcommand in favor of just the 'search' > subcommand with output formatting options. That's even better! I think I'll be using my patch until these patches are merged :) > > jamie. -- Regards, Ali Polatel -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100114/d59cd8ef/attachment.pgp>
[notmuch] Potential problem using Git for mail (was: Idea for storing tags)
On Tue, 12 Jan 2010, martin f krafft wrote: > If the MDA delivers to Git, then potentially, you might get into a > situation where you cannot write your own changes back to the repo. This > is also a DoS scenario: I'll just keep sending you e-mail, and if I > manage to pass your mail filters, I'll basically commit to your mail > repository at regular intervals. Say those are 5 seconds. In order for > you to write updates to the repo, e.g. to update tags, then you would > need to pull, rebase, and push all within 5 seconds, for otherwise you'd > try to push non-fast-forwards. Sure. But the MDA doesn't need to do the commit immediately. Since (presumably) we're using Maildir, the MDA on the mail receiving server is going to generate filenames that won't cause conflicts. So it's okay to leave the files uncommitted. If that's too scary, then have the MDA deliver to its own git branch with its own checkout. Then, if you can force linearity with a lock (!), your client can have a special "lock the repo and push" command. Your remote MUA could even ask the MDA to lock the Maildir while it does a merge and then pushes that, and then the MDA can go back to dequeuing messages from the MTA into the Maildir. Not the beautiful lockless world the purists want, but I'm okay with that. > This a bit unrealistic, surely, but there's a real annoyance in it: > you'd have to pull/rebase/push until a push succeeds ? until you found a > time window between pull and push during which the MDA didn't write to > the repo. This might take a long time. If this happens in the background > by Cron, it's not a real concern, but if this becomes a UI issue, I > wouldn't know how to handle it. It's not entirely unreasonable. Cron caused issues like that for me when I tracked my Maildir in git. I'm just learning about notmuchmail.org, but I'll keep listening here. Preferably CC: me on replies to this mail. I will say, I'm interested in an email setup with with working IMAP on at least one side. There's one other bad race I ran into when using git to manage my Maildirs. I was using Dovecot to serve my Maildir to an IMAP client, alpine. I separately did a "git merge" from origin/master, where the remote MTA had an MDA deliving messages and a layer on top of that committed them. When I did the "git merge", git would create the Maildir files in ~/Maildir/cur/... non-atomically. Dovecot would notice the file in ~/Maildir/cur/ and think, "This file must be ready!" So it would parse it even though git hadn't finished writing it. This caused me to only see partial headers in Alpine since Dovecot parsed it before it was a complete message. That kind of sucked. -- Asheesh. -- Almost anything derogatory you could say about today's software design would be accurate. -- K. E. Iverson
[notmuch] indexing mail?
Hi, how do you add new mails to the index? manual says notmuch new should be enough, but it simply says No new mail. on the first run (when no .notmuch is there yet), it finds some messages, but doesn't index them either. $ notmuch search tag:inbox $ $ notmuch search s $ -- Arvid Asgaard Technologies ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] indexing encrypted messages (was: OpenPGP support)
On 2010-01-08, James Westby wrote: That would leave an open question over whether future notmuch show invocations would return the plaintext or ciphertext. If it is the latter then it requires decrypting every time you want to view it, but it does mean that there is less information leakage (you could find out whether an encrypted message contained a particular term, but not read the whole message directly). You can actually use the term position information to reconstruct the original message text pretty well. It misses capitalisation, punctuation, and distinctions between whitespace, but is generally enough to allow the message to be understood: http://article.gmane.org/gmane.comp.search.xapian.general/2187 Cheers, Olly ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] Idea for storing tags
On Thu, 14 Jan 2010 21:04:21 +1300, martin f krafft madd...@madduck.net wrote: You might have marked a message 'read' on one machine and if the two get out of sync on another machine, you might have the same message unread there. That's a different issue though. With two databases there's clearly the opportunity for the two databases to be out of synch. But you talked about the database being out of synch with respect to the mailstore. And that's something I just don't understand, (given the assumption that all tags are stored in the database---which was the explicit description of the case of interest). Shouldn't this just be solved? I've had formail+procmail delete my duplicates for 10+ years, and while I don't like the fact that I usually get the CC before the list mail, and thus cannot filter on Delivered-To, I have never looked back. Notmuch has access to all the information it needs to allow you to delete the CC version once the list mail arrives. So you could do notmuch-based deletion now and avoid losing the Delivered-To header if you want. [*] Though, I think a plain-text file with tags managed with something like git (and perhaps a custom merger) could save a lot of work. Or perhaps a plain-text journal of tag manipulations on either end that could be replayed on the other. Git is good at conflict resolution if run interactively, but [0] still makes me question whether it can ever take the place of IMAP. However, Asheesh Laroia, who has floated the idea of Git-for-mail at DebConf8 already, has some ideas and hopefully will soon reply to my mail [0], which I just bounced. 0. http://notmuchmail.org/pipermail/notmuch/2010/001114.html Using git for mail is an interesting idea, but not what I was actually proposing here. I think that synchronizing the mail store and synchronizing the tags information are tasks that have different requirements, and for which we may well want different tools. So I was talking about using imap (or rsync, or what have you) for copying the mailtstore, and then having something with a bit more domain-specific awareness for doing the synchronization of the tags data. -Carl pgpiO4aGHApgV.pgp Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] Idea for storing tags
also sprach Carl Worth cwo...@cworth.org [2010.01.15.1124 +1300]: You might have marked a message 'read' on one machine and if the two get out of sync on another machine, you might have the same message unread there. That's a different issue though. With two databases there's clearly the opportunity for the two databases to be out of synch. But you talked about the database being out of synch with respect to the mailstore. And that's something I just don't understand, (given the assumption that all tags are stored in the database---which was the explicit description of the case of interest). Yes, we are talking about the situation where the tagstore is seperate from the mailstore, and that they are both synchronised with a server, or between machines, separately. If for some reason you only synchronise the mailstore — say because the connection drops before the sync of the tagstore completes — then you end up with an out-of-sync situation, because the mailstore-sync will have pulled in a new message, but not the associated tags. So if you had already read this message on another machine and tagged it 'done', then it would show up on this machine as 'new' without the 'done' tag, because the tags were not synchronised. The only way to really solve this is by transferring a message and its tags in a transactional way. Shouldn't this just be solved? I've had formail+procmail delete my duplicates for 10+ years, and while I don't like the fact that I usually get the CC before the list mail, and thus cannot filter on Delivered-To, I have never looked back. Notmuch has access to all the information it needs to allow you to delete the CC version once the list mail arrives. So you could do notmuch-based deletion now and avoid losing the Delivered-To header if you want. Of course. I hadn't thought that far. However, there are still benefits to formail, namely avoiding having to run duplicates through potentially expensive spamfilters. I think that synchronizing the mail store and synchronizing the tags information are tasks that have different requirements, and for which we may well want different tools. Fair enough. Maybe I am just paranoid about the stores getting out of sync (see above). -- martin | http://madduck.net/ | http://two.sentenc.es/ we all know linux is great... it does infinite loops in 5 seconds. -- linus torvalds spamtraps: madduck.bo...@madduck.net digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/) ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] Threading
also sprach Carl Worth cwo...@cworth.org [2010.01.15.1108 +1300]: Reading is one thing. Information storage and organisation is another. After a message is delivered (and read) to my mailbox, it's really mine and I can (and should be able) to affix it and integrate it into my organisational scheme any way I want, don't you think? A fair point. I don't see this being something I'm going to spend any time implementing. I just wouldn't use the functionality myself. But I would be happy to integrate patches if someone came up with some. Maybe I should try to persuade you in person. Just today I referenced a discussion I had with a client's ISP, which was done via a web-based support system (custhelp.com). They send you e-mail for every post you or they make to the thread, but those e-mails do not reference each other. Fortunately, I stitched them together and when I searched for the correspondence in my mailstore, I had the entire thread available to me, which was handy (thanks to mutt's useful thread handling abilities). -- martin | http://madduck.net/ | http://two.sentenc.es/ this week dragged past me so slowly; the days fell on their knees... -- david bowie spamtraps: madduck.bo...@madduck.net digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/) ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] indexing mail?
On Thu, 14 Jan 2010 18:38:54 +0100, Adrian Perez de Castro ape...@igalia.com wrote: the offending commit is 2c4555f1a56602ff1dd55a63699810522ba4d91e from readdir (3): Currently, only some file systems (among them: Btrfs, ext2, ext3, and ext4) have full support returning the file type in d_type. All applications must properly handle a return of DT_UNKNOWN. Yes. The broken code was my mistake. I clearly didn't read the above warning closely enough. Sorry about that! I am using XFS, which always returns DT_UNKNOWN. Taking into account that there is a good deal of people using filesystems other than the ones you mention, and that other non-linux filesystems may also return DT_UNKNOWN, in my opinion there should be a fall-back. I will try to post a patch Anytime Soon™. We definitely want the fallback. I can attempt to code it, but I don't have ready access to an afflicted filesystem, so I'd need help testing anyway. I'd love to see a patch for this bug soon. Be sure to CC me when the patch is sent and that will help me commit it sooner. Also, I have the feeling that the d_type field from struct dirent may not be available in some OSes because it is a BSD extension. I'm generally quite bad at determining whether functionality I'm using in my software is non-portable. As proven in this case, even when the man page tells me something is not portable I don't always notice, (and often, the man pages aren't even that useful). Beyond that, even if something is *known* to be theoretically non-portable, it can be a waste of time to code compatibility paths that nobody will be running in practice. So I've basically gotten to the point where I just code for what works on my system, (not out of disregard for what other people run---just that it's impossible for me to know what subset of functionality is actually relevant). Then, at the same time, I'm quite happy to accept code to improve the portability when people note that things are broken on other systems. See the git history and email archives for examples of how we fixed strndup and getline portability problems. I know that wait for people to notice it's broken isn't the nicest thing we could do with our code. But I don't really know a much better way. I'm happy to entertain suggestions here. -Carl pgpUGYn1hAchH.pgp Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] Notmuch performance problems on OSX
Hi Oliver, welcome to notmuch! On Thu, 14 Jan 2010 15:30:48 +, Oliver Charles oliver.g.char...@googlemail.com wrote: I've installed the latest notmuch from Git at this time of writing, along with Xapian from SVN head. However, just tagging a single thread with only one message seems to take too long: $ time notmuch tag +dissertation thread:7dc536441e6deade4256a46d46451221 real 0m0.812s user 0m0.022s sys 0m0.037s Things work quite a bit faster than that on my machine: $ time notmuch tag +foo id:5641883d1001140730l22832715ld6bdc95c9938d...@mail.gmail.com real0m0.024s user0m0.012s sys 0m0.004s But that could just be system differences. And tagging all my messages is really horrible: $ time notmuch tag +foobar tag:inbox real 0m5.076s user 0m3.688s sys 0m0.105s For this operation, I can't really compare. How many messages are you tagging? Here's that operation for me with 525 messages in my inbox: $ time notmuch tag +foobar tag:inbox real0m1.551s user0m1.504s sys 0m0.016s That xapian-svn was built from svn HEAD right now, so I'm assuming it contains the #250 fix (http://trac.xapian.org/changeset/13808) Which I think means that things could have been even *much* slower before. ;-) The Xapian defect #250 was just one, initial (and obvious) performance problem. [Though, as I mentioned in a previous thread, if you're using a Xapian flint database, (look for .notmuch/xapian/iamflint), then you won't get the benefit of the Xapian fix until you rebuild your notmuch database from scratch with a current notmuch.] Once you've verified that you've got the #250 fix functional, there could still be lots of performance bugs. And it would be time to start profiling. Perhaps the notmuch daemon idea (which we've proposed earlier for other reasons) could help reduce overhead from reading the database and writing it back out again. So that might be one avenue to explore for fixing things. I have no idea what OS X does, but Linux keeps my notmuch database in its buffer cache so I can do these operations without even touching disk (which is actually an SSD anyway, which also helps). I just tried, and was able to get the single-message tag operation to be 3 times slower by dropping the cache: $ sudo sh -c echo 3 /proc/sys/vm/drop_caches $ time notmuch tag +foo id:5641883d1001140730l22832715ld6bdc95c9938d...@mail.gmail.com real0m0.062s user0m0.000s sys 0m0.020s But again, whatever the performance problem might be, the first step would be to examine some profiles. (And I'm clueless, myself as to what profiling tools might be available for OS X.) -Carl pgpzh9qB9woVQ.pgp Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] Thoughts on notmuch and Lua
also sprach Carl Worth cwo...@cworth.org [2010.01.15.1200 +1300]: Lua has many advantages over other scripting languages when it comes to integration with a C program. It has a very clean and easy C API, the overhead of running Lua scripts is not noticable among other things. I've definitely heard lots of good things about lua embedability. So if we do decide to provide hooks, then lua would seem like a logical option to look at first. I've never looked at it closely myself though. Lua for hooks has the advantage that the hooks can be executed in the context of manipulateable objects. On the other hand, hooks in the style of run-parts directories are more flexible and accessible, and could always be invoked as filters for the manipulateable data. -- martin | http://madduck.net/ | http://two.sentenc.es/ imagine if every thursday your shoes exploded if you tied them the usual way. this happens to us all the time with computers, and nobody thinks of complaining. -- jeff raskin spamtraps: madduck.bo...@madduck.net digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/) ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] indexing mail?
On 2010-01-14, Carl Worth wrote: On Thu, 14 Jan 2010 18:38:54 +0100, Adrian Perez de Castro ape...@igalia.com wrote: I am using XFS, which always returns DT_UNKNOWN. Taking into account that there is a good deal of people using filesystems other than the ones you mention, and that other non-linux filesystems may also return DT_UNKNOWN, in my opinion there should be a fall-back. I will try to post a patch Anytime Soon=E2=84=A2. We definitely want the fallback. I can attempt to code it, but I don't have ready access to an afflicted filesystem, so I'd need help testing anyway. I'd love to see a patch for this bug soon. Be sure to CC me when the patch is sent and that will help me commit it sooner. Not a full patch, but I already posted what this code should look like to handle both systems without d_type, and those which return DT_UNKNOWN: http://article.gmane.org/gmane.mail.notmuch.general/1044 Cheers, Olly ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] Notmuch performance problems on OSX
On 2010-01-14, Oliver Charles wrote: I've installed the latest notmuch from Git at this time of writing, along with Xapian from SVN head. However, just tagging a single thread with only one message seems to take too long: One difference between OS X and other systems is that OS X supports the F_FULLSYNC ioctl, and other systems don't (currently, at least AFAIK) and Xapian uses that if it is available to ensure that changes have actually made it to disk: http://trac.xapian.org/ticket/288 On other systems, it uses fdatasync() or fsync(), which typically just ensure that the data has left the OS - it can sit in disk controller or drive caches for potentially seconds longer. This call happens once per table for every (explicit or implicit) flush on a database. I can see an issue here which is that currently Xapian writes the base file for the table, then syncs it, then does the next table. I bet it would be more efficient to write them all and then sync them all, especially with F_FULLSYNC. I'll take a look at doing that, and have created a ticket for it: http://trac.xapian.org/ticket/426 If after that this is still causing problems, it should probably be made configurable what (if any) flushing is done. If you're on a UPS-backed server, you probably don't need such paranoia. Cheers, Olly ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] indexing mail?
Olly == Olly Betts o...@survex.com writes: Olly On 2010-01-14, Carl Worth wrote: On Thu, 14 Jan 2010 18:38:54 +0100, Adrian Perez de Castro ape...@igalia.com wrote: I am using XFS, which always returns DT_UNKNOWN. Taking into account that there is a good deal of people using filesystems other than the ones you mention, and that other non-linux filesystems may also return DT_UNKNOWN, in my opinion there should be a fall-back. I will try to post a patch Anytime Soon=E2=84=A2. We definitely want the fallback. I can attempt to code it, but I don't have ready access to an afflicted filesystem, so I'd need help testing anyway. I'd love to see a patch for this bug soon. Be sure to CC me when the patch is sent and that will help me commit it sooner. Olly Not a full patch, but I already posted what this code should look like Olly to handle both systems without d_type, and those which return DT_UNKNOWN: Olly http://article.gmane.org/gmane.mail.notmuch.general/1044 I take a slighly different approach in mu: /* if the file system does not support entry-d_type, we add it ourselves * this is slower (extra stat) but at least it works */ static gboolean _set_dtype (const char* path, struct dirent *entry) { struct stat statbuf; char fullpath[4096]; snprintf (fullpath, sizeof(fullpath), %s%c%s, path, G_DIR_SEPARATOR, entry-d_name); if (stat (fullpath, statbuf) != 0) { g_warning (stat failed on %s: %s, fullpath, strerror(errno)); return FALSE; } /* we only care about dirs, regular files and links */ if (S_ISREG (statbuf.st_mode)) entry-d_type = DT_REG; else if (S_ISDIR (statbuf.st_mode)) entry-d_type = DT_DIR; else if (S_ISLNK (statbuf.st_mode)) entry-d_type = DT_LNK; return TRUE; } and then in some other places: /* handle FSs that don't support entry-d_type */ if (entry-d_type == DT_UNKNOWN) _set_dtype (dirname, entry); Note, that is untested as of yet. Best wishes, Dirk. -- Dirk-Jan C. Binnema Helsinki, Finland e:d...@djcbsoftware.nl w:www.djcbsoftware.nl pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] indexing mail?
On 2010-01-15, Dirk-Jan C Binnema wrote: Olly == Olly Betts o...@survex.com writes: Olly Not a full patch, but I already posted what this code should look like Olly to handle both systems without d_type, and those which return DT_UNKNOWN: Olly http://article.gmane.org/gmane.mail.notmuch.general/1044 static gboolean _set_dtype (const char* path, struct dirent *entry) Underscore prefixed identifiers are reserved by ISO C at file-scope; using them yourself is undefined behaviour... /* we only care about dirs, regular files and links */ if (S_ISREG (statbuf.st_mode)) entry-d_type = DT_REG; else if (S_ISDIR (statbuf.st_mode)) entry-d_type = DT_DIR; else if (S_ISLNK (statbuf.st_mode)) entry-d_type = DT_LNK; This addresses the case where the FS returns DT_UNKNOWN for d_type, but doesn't deal with the case of platforms where struct dirent has no d_type member - from the Linux readdir man page: The only fields in the dirent structure that are mandated by POSIX.1 are: d_name[], of unspecified size, with at most NAME_MAX characters preceding the terminating null byte; and (as an XSI extension) d_ino. The other fields are unstandardized, and not present on all systems; see NOTES below for some further details. And in NOTES: Other than Linux, the d_type field is available mainly only on BSD systems. Cheers, Olly ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch