WARNING: database upgrade coming
Tomi Ollila writes: > Some ideas to bikeshed with: > > "The database upgrade is done in a new database; at the end of the updrade > the current database is replaced with the new one -- Interrupting updrade > (with Ctrl-C) leaves you with the current database." In a condition where free space on filesystem is less than size of database... things could get interesting, right? At the very least it's probably not worth even attempting the upgrade unless there's a --force or something. -- Stewart Smith -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 818 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20140318/b6fb6abe/attachment-0001.pgp>
Re: WARNING: database upgrade coming
Tomi Ollila tomi.oll...@iki.fi writes: Some ideas to bikeshed with: The database upgrade is done in a new database; at the end of the updrade the current database is replaced with the new one -- Interrupting updrade (with Ctrl-C) leaves you with the current database. In a condition where free space on filesystem is less than size of database... things could get interesting, right? At the very least it's probably not worth even attempting the upgrade unless there's a --force or something. -- Stewart Smith pgp3hCMHvxeBS.pgp Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Alternative (raw) message store (i.e. instead of maildir)
Vladimir Marek writes: > Well, if your granularity will be one archive per year of mail, it > should not be that bad ... Except for someone like Keith, who has all his email since sometime in the 80s or something insane like that :) -- Stewart Smith -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20120815/f4be45e4/attachment.pgp>
Alternative (raw) message store (i.e. instead of maildir)
Vladimir Marek writes: > Hi, > > I have objections against maildir too, but I tried to tackle it from > different perspective. Store the maildir in zip file and use fuse-zip to > manage it. It works sort of but it has two major disadvantages: huh... this is fairly interesting one of the downsides of a million odd files for mail is that filesystem dump and restore takes a *LOT* longer than if it's just giant files on disk. Combined with afuse (fuse automounter) this could be a pretty elegant solution to the problem of storing archival Maildirs. One large archival maildir here went from 6.5GB (du -sh on XFS) to a 2.3GB ZIP archive that will never, ever change. Think about the performance difference between creating 560,000 files for backup/restore versus copying a single 2.3GB file. > - fuse zip stores all changes in memory until unmounted > - fuse zip (and libzip for that matter) creates new temporary file when >updating archive, which takes considerable time when the archive is >very big. This isn't much of a hastle if you have maildir per time period and archive off. Maybe if you sync flags it may be... > Of course this solution would have some disadvantages too, but for me > the advantages would win. At the moment I'm not sure if I want to > continue working on that. Maybe if there would be more interested guys I'm *really* tempted to investigate making this work for archived mail. Of course, the list of mounted file systems could get insane depending on granularity I guess... -- Stewart Smith -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20120814/cf027a1b/attachment.pgp>
Re: Alternative (raw) message store (i.e. instead of maildir)
Vladimir Marek vladimir.ma...@oracle.com writes: Well, if your granularity will be one archive per year of mail, it should not be that bad ... Except for someone like Keith, who has all his email since sometime in the 80s or something insane like that :) -- Stewart Smith pgpqbDWUxd3Kw.pgp Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: Alternative (raw) message store (i.e. instead of maildir)
Vladimir Marek vladimir.ma...@oracle.com writes: Hi, I have objections against maildir too, but I tried to tackle it from different perspective. Store the maildir in zip file and use fuse-zip to manage it. It works sort of but it has two major disadvantages: huh... this is fairly interesting one of the downsides of a million odd files for mail is that filesystem dump and restore takes a *LOT* longer than if it's just giant files on disk. Combined with afuse (fuse automounter) this could be a pretty elegant solution to the problem of storing archival Maildirs. One large archival maildir here went from 6.5GB (du -sh on XFS) to a 2.3GB ZIP archive that will never, ever change. Think about the performance difference between creating 560,000 files for backup/restore versus copying a single 2.3GB file. - fuse zip stores all changes in memory until unmounted - fuse zip (and libzip for that matter) creates new temporary file when updating archive, which takes considerable time when the archive is very big. This isn't much of a hastle if you have maildir per time period and archive off. Maybe if you sync flags it may be... Of course this solution would have some disadvantages too, but for me the advantages would win. At the moment I'm not sure if I want to continue working on that. Maybe if there would be more interested guys I'm *really* tempted to investigate making this work for archived mail. Of course, the list of mounted file systems could get insane depending on granularity I guess... -- Stewart Smith pgpZcxW0PhtqJ.pgp Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[RFC PATCH 00/13] Modular message store code
On Wed, 15 Feb 2012 17:01:53 -0500, Ethan Glasser-Camp wrote: > I'm submitting as RFC this patch series, which introduces the idea of > a "mailstore", a "class" that defines how to access mail, instead of > currently assuming it's always some Maildir-ish hierarchy that > contains a bunch of mail. This is really awesome. Quite a while ago now I did some experiments on storing my entire Maildir inside git packs instead of in maildir. This produced an *amazing* saving in disk space used. My idea is to end up with Maildir for "current" (as everything delivers into Maildir without a problem) and then on a (say) monthly basis, packing all mail into an archive file and have notmuch be able to still read it. you know what... this patch set has re-ignited my interest in making that work. -- Stewart Smith
Re: [RFC PATCH 00/13] Modular message store code
On Wed, 15 Feb 2012 17:01:53 -0500, Ethan Glasser-Camp gla...@cs.rpi.edu wrote: I'm submitting as RFC this patch series, which introduces the idea of a mailstore, a class that defines how to access mail, instead of currently assuming it's always some Maildir-ish hierarchy that contains a bunch of mail. This is really awesome. Quite a while ago now I did some experiments on storing my entire Maildir inside git packs instead of in maildir. This produced an *amazing* saving in disk space used. My idea is to end up with Maildir for current (as everything delivers into Maildir without a problem) and then on a (say) monthly basis, packing all mail into an archive file and have notmuch be able to still read it. you know what... this patch set has re-ignited my interest in making that work. -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Preventing the user shooting themself in the foot
On Wed, 29 Jun 2011 22:40:07 -0700, Carl Worth wrote: > This means that messages can lose the "unread" tag while still remaining > tagged "inbox", (you read a message, but don't archive it), and that > messages can lose the "archive" tag while still remaining tagged > "unread", (you archive a thread before reading all messages in the > thread). > > The distinction ends up being useful to me. If at some point someone > points me to a specific message, and when I search for it I see the > "unread" tag, then this highlights to me that I never even looked at the > message. IMHO this is one of the awesome things about notmuch (and I've actively used it to go back on conversations I previously ignored) -- Stewart Smith
Re: Preventing the user shooting themself in the foot
On Wed, 29 Jun 2011 22:40:07 -0700, Carl Worth cwo...@cworth.org wrote: This means that messages can lose the unread tag while still remaining tagged inbox, (you read a message, but don't archive it), and that messages can lose the archive tag while still remaining tagged unread, (you archive a thread before reading all messages in the thread). The distinction ends up being useful to me. If at some point someone points me to a specific message, and when I search for it I see the unread tag, then this highlights to me that I never even looked at the message. IMHO this is one of the awesome things about notmuch (and I've actively used it to go back on conversations I previously ignored) -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[BUG] [PATCH] Fix appending of Received headers
On Fri, 10 Jun 2011 17:22:50 -0700, Carl Worth wrote: Non-text part: multipart/signed > On Tue, 24 May 2011 13:33:25 -0700, Carl Worth wrote: > > On Tue, 17 May 2011 12:10:32 +1000, Stewart Smith > flamingspork.com> wrote: > > > We're not properly concatenating the Received headers if we parse them > > > while requesting a header that isn't Received. > ... > > I'd prefer to fix the test suite here so that we don't later regress on > > this behavior. > > I've done that now. What the test suite was missing was having messages > that actually had more than one Received header, (otherwise, no > concatenation was ever used in the testing). > > The new test and the patch are both now pushed. Great and thanks! Sorry I didn't manage to get updating test suite to the top of my TODO list. -- Stewart Smith
Re: Multiple sender identities (composing)
On Tue, 24 May 2011 14:54:37 -0700, Carl Worth cwo...@cworth.org wrote: I've wanted something like this, but I'm extremely reluctant to put fancy things like this in my .emacs file. The problem I have is that I don't want to restrict nice features to the people who manage to configure their emacs just so. I completely agree - and am rather glad that there's a proper solution now. I'll reply with a patch I just wrote attempting to implement that. By default, it generates the list of addresses by looking in your notmuch configuration file. It also provides a customizable list of addresses that the user can provide (notmuch-identities). I'll try trunk with the patches as soon as I get home from travel and am somewhat remotely close to not being a zombie. I don't know what trouble you had with ido on Ubuntu, but hopefully you can work that out. I hope so too... it could just be how I was trying to use it or user ignorance or something like that. -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Multiple sender identities (composing)
On Tue, 24 May 2011 14:54:37 -0700, Carl Worth wrote: > I've wanted something like this, but I'm extremely reluctant to put > fancy things like this in my .emacs file. The problem I have is that I > don't want to restrict nice features to the people who manage to > configure their emacs "just so". I completely agree - and am rather glad that there's a proper solution now. > I'll reply with a patch I just wrote attempting to implement that. By > default, it generates the list of addresses by looking in your notmuch > configuration file. It also provides a customizable list of addresses > that the user can provide (notmuch-identities). I'll try trunk with the patches as soon as I get home from travel and am somewhat remotely close to not being a zombie. > I don't know what trouble you had with ido on Ubuntu, but hopefully you > can work that out. I hope so too... it could just be how I was trying to use it or user ignorance or something like that. -- Stewart Smith
[notmuch] Mail in git
On Sat, 21 May 2011 09:05:54 +0200, martin f krafft wrote: > Has anyone worked on this since? No, haven't had the cycles... and SSD helped a bit to delay urgency. -- Stewart Smith
Re: [notmuch] Mail in git
On Sat, 21 May 2011 09:05:54 +0200, martin f krafft madd...@madduck.net wrote: Has anyone worked on this since? No, haven't had the cycles... and SSD helped a bit to delay urgency. -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[BUG] [PATCH] Fix appending of Received headers
We're not properly concatenating the Received headers if we parse them while requesting a header that isn't Received. this fixes notmuch-reply address detection in a bunch of situations. diff --git a/lib/message-file.c b/lib/message-file.c index 7722832..dd0f698 100644 --- a/lib/message-file.c +++ b/lib/message-file.c @@ -329,7 +329,7 @@ notmuch_message_file_get_header (notmuch_message_file_t *message, /* we treat the Received: header special - we want to concat ALL of * the Received: headers we encounter. * for everything else we return the first instance of a header */ - if (is_received) { + if (strcasecmp(header, "received") == 0) { if (header_sofar == NULL) { /* first Received: header we encountered; just add it */ g_hash_table_insert (message->headers, header, decoded_value); -- Stewart Smith
Multiple sender identities (composing)
On Mon, 16 May 2011 11:52:43 +0200, Thomas Jost wrote: > On Mon, 16 May 2011 19:29:07 +1000, Stewart Smith flamingspork.com> wrote: > (people who don't use or like ido may want to replace > ido-completing-read with completing-read) I couldn't get ido to work at all (Ubuntu Natty). It would just prompt and not tab complete or even accept enter (it would insert a newline in minibuffer) - which is why I just ended up using completing-read. > - function to change the SMTP server that will be used for sending the > mail according to the From header I actually just do this via postfix sender_dependent_relayhost_maps which ends up working quite nicely. -- Stewart Smith
Multiple sender identities (composing)
Thought I'd share this bit of my .emacs snippet that may be useful to go on the emacs tips page. This does the following: - sets up a list of possible identities to have mail From - on composing mail, it prompts you for who you want to send mail from - pressing enter will give you the default (first in the list) - otherwise you have tab completion You may also want to set this: '(message-sendmail-envelope-from (quote header)) (in custom-set-variables) so that if you're doing postfix sender based routing or the like, it gets the correct address and doesn't end up sending things the wrong way. (setq stewart/mua-identities (list "Stewart Smith " "Stewart Smith ")) (defun stewart/notmuch-mua-mail ( from) (interactive) (setq from (completing-read "Sender identity: " stewart/mua-identities nil t nil nil (car stewart/mua-identities))) (notmuch-mua-mail nil nil (list (cons 'from from (define-key notmuch-show-mode-map "m" (lambda () "send email" (interactive) (stewart/notmuch-mua-mail))) (define-key notmuch-search-mode-map "m" (lambda () "send email" (interactive) (stewart/notmuch-mua-mail))) -- Stewart Smith
Multiple sender identities (composing)
Thought I'd share this bit of my .emacs snippet that may be useful to go on the emacs tips page. This does the following: - sets up a list of possible identities to have mail From - on composing mail, it prompts you for who you want to send mail from - pressing enter will give you the default (first in the list) - otherwise you have tab completion You may also want to set this: '(message-sendmail-envelope-from (quote header)) (in custom-set-variables) so that if you're doing postfix sender based routing or the like, it gets the correct address and doesn't end up sending things the wrong way. (setq stewart/mua-identities (list Stewart Smith stew...@flamingspork.com Stewart Smith stewart.sm...@percona.com)) (defun stewart/notmuch-mua-mail (optional from) (interactive) (setq from (completing-read Sender identity: stewart/mua-identities nil t nil nil (car stewart/mua-identities))) (notmuch-mua-mail nil nil (list (cons 'from from (define-key notmuch-show-mode-map m (lambda () send email (interactive) (stewart/notmuch-mua-mail))) (define-key notmuch-search-mode-map m (lambda () send email (interactive) (stewart/notmuch-mua-mail))) -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: Multiple sender identities (composing)
On Mon, 16 May 2011 11:52:43 +0200, Thomas Jost schno...@schnouki.net wrote: On Mon, 16 May 2011 19:29:07 +1000, Stewart Smith stew...@flamingspork.com wrote: (people who don't use or like ido may want to replace ido-completing-read with completing-read) I couldn't get ido to work at all (Ubuntu Natty). It would just prompt and not tab complete or even accept enter (it would insert a newline in minibuffer) - which is why I just ended up using completing-read. - function to change the SMTP server that will be used for sending the mail according to the From header I actually just do this via postfix sender_dependent_relayhost_maps which ends up working quite nicely. -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
storing From and Subject in xapian
On Sun, 08 May 2011 22:24:37 -0700, Istvan Marko wrote: > Jameson Graef Rollins writes: > > > Unless I hear a strong positive response I'll hold off on considering it > > for 0.6, and suggest instead targeting it for 0.7. > > I would say wait until 0.7 at least. > > An important thing missing is fallback to the old method for messages > where the Subject/From VALUE fields don't exist. Otherwise people will > get blank results until they rebuild their database. Would it be possible to progressively fill the DB with the new data? i.e. if Subject/From not in db for message add Subject/From for this message to DB. ? That'd be awesome from my pov (having just rebuilt my database in chert format and that took FOREVER). -- Stewart Smith
Re: storing From and Subject in xapian
On Sun, 08 May 2011 22:24:37 -0700, Istvan Marko notm...@kismala.com wrote: Jameson Graef Rollins jroll...@finestructure.net writes: Unless I hear a strong positive response I'll hold off on considering it for 0.6, and suggest instead targeting it for 0.7. I would say wait until 0.7 at least. An important thing missing is fallback to the old method for messages where the Subject/From VALUE fields don't exist. Otherwise people will get blank results until they rebuild their database. Would it be possible to progressively fill the DB with the new data? i.e. if Subject/From not in db for message add Subject/From for this message to DB. ? That'd be awesome from my pov (having just rebuilt my database in chert format and that took FOREVER). -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
notmuch's idea of concurrency / failing an invocation
On Sat, 29 Jan 2011 19:14:27 -0500, Daniel Kahn Gillmor wrote: > On 01/28/2011 08:05 PM, Stewart Smith wrote: > > I'm about at the point where I'm going to take my git mail store > > experiments and get them really to work (and everyone will have to use > > 'notmuch cat' or the like to access the messages) > > Would this hypothetical git-based mail store retain the atomicity and > lockless concurrent-access of a maildir? That is, could it be used in a > server environment? My idea is that it would be... at least with the experiments conducted so far. > > which should provide > > both great storage efficiency, much faster backups of your Maildir as > > well as having way fewer paths to traverse checking for new mail. > > when you say "backups of your Maildir" do you mean "backups of your > git-based mail store" ? or is this somehow a literal Maildir stored in git? I'll write more "soon" when there is more code behind it... and I figure out a good upgrade path to something that is also self-consistently sane. -- Stewart Smith
Re: notmuch's idea of concurrency / failing an invocation
On Sat, 29 Jan 2011 19:14:27 -0500, Daniel Kahn Gillmor d...@fifthhorseman.net wrote: On 01/28/2011 08:05 PM, Stewart Smith wrote: I'm about at the point where I'm going to take my git mail store experiments and get them really to work (and everyone will have to use 'notmuch cat' or the like to access the messages) Would this hypothetical git-based mail store retain the atomicity and lockless concurrent-access of a maildir? That is, could it be used in a server environment? My idea is that it would be... at least with the experiments conducted so far. which should provide both great storage efficiency, much faster backups of your Maildir as well as having way fewer paths to traverse checking for new mail. when you say backups of your Maildir do you mean backups of your git-based mail store ? or is this somehow a literal Maildir stored in git? I'll write more soon when there is more code behind it... and I figure out a good upgrade path to something that is also self-consistently sane. -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
notmuch's idea of concurrency / failing an invocation
On Thu, 27 Jan 2011 13:40:25 -0500, micah anderson wrote: > Due to my harddisk in my laptop being slow (5400RPM), my notmuch > database growing, and perhaps some fragmentation somewhere, this has > become *incredibly* annoying for me. I am checking email every 30 > minutes, and I'm nicing and ionicing the processes so I can use my > machine, but while those processes are running, I'm effectively locked > out of a good portion of my email. I used to use spinning rust and also noticed things were slow. This is in fact mostly not xapian - but rather crawling the Maildir. I improved this early on in notmuch history by reducing the number of seeks needed when traversing the Maildir hierarchy (e.g. stat in i-node order, which is roughly on-disk order). I'm about at the point where I'm going to take my git mail store experiments and get them really to work (and everyone will have to use 'notmuch cat' or the like to access the messages) which should provide both great storage efficiency, much faster backups of your Maildir as well as having way fewer paths to traverse checking for new mail. -- Stewart Smith
Re: notmuch's idea of concurrency / failing an invocation
On Thu, 27 Jan 2011 13:40:25 -0500, micah anderson mi...@riseup.net wrote: Due to my harddisk in my laptop being slow (5400RPM), my notmuch database growing, and perhaps some fragmentation somewhere, this has become *incredibly* annoying for me. I am checking email every 30 minutes, and I'm nicing and ionicing the processes so I can use my machine, but while those processes are running, I'm effectively locked out of a good portion of my email. I used to use spinning rust and also noticed things were slow. This is in fact mostly not xapian - but rather crawling the Maildir. I improved this early on in notmuch history by reducing the number of seeks needed when traversing the Maildir hierarchy (e.g. stat in i-node order, which is roughly on-disk order). I'm about at the point where I'm going to take my git mail store experiments and get them really to work (and everyone will have to use 'notmuch cat' or the like to access the messages) which should provide both great storage efficiency, much faster backups of your Maildir as well as having way fewer paths to traverse checking for new mail. -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[PATCH] Fix linker error from insufficient LDFLAGS
On Fri, 23 Apr 2010 17:53:17 -0700, Carl Worth wrote: > On Thu, 22 Apr 2010 18:20:27 -0400, Ben Gamari > wrote: > > It seems that LDFLAGS have recently been reorganized, along with the > > introduction of a notmuch-shared rule. Unfortunately, the LDFLAGS used > > in notmuch-shared don't include CONFIGURE_LDFLAGS. This caused linking > > to fail with the following, > > What system is this on? I got this. Ubuntu 9.10 with gold as linker: $ ld --version GNU gold (GNU Binutils for Ubuntu 2.20) 1.9 which could be what's causing it? anyway, this patch fixed linking for me. -- Stewart Smith
Re: [PATCH] Fix linker error from insufficient LDFLAGS
On Fri, 23 Apr 2010 17:53:17 -0700, Carl Worth cwo...@cworth.org wrote: On Thu, 22 Apr 2010 18:20:27 -0400, Ben Gamari bgamari.f...@gmail.com wrote: It seems that LDFLAGS have recently been reorganized, along with the introduction of a notmuch-shared rule. Unfortunately, the LDFLAGS used in notmuch-shared don't include CONFIGURE_LDFLAGS. This caused linking to fail with the following, What system is this on? I got this. Ubuntu 9.10 with gold as linker: $ ld --version GNU gold (GNU Binutils for Ubuntu 2.20) 1.9 which could be what's causing it? anyway, this patch fixed linking for me. -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[PATCH 1/4] Mailstore abstraction interface
On Tue, 13 Apr 2010 10:53:12 -0700, Carl Worth wrote: > This series is looking like one of the most complete approaches to > maildir-flag synchronization, (and I like some of the motivation that > leads to "notmuch cat"). But I think the mailstore abstraction is > largely a distraction from the real features here. For my case (of wanting to have backup of my mailstore complete in reasonable time, preferably using less disk space) of wanting mail in git packs, 'notmuch cat' being used everywhere removes a lot of the issues of doing this. (pluggin in an alternative to readdir is fairly simple... but the emacs UI needs to read from it too :) -- Stewart Smith
Re: [PATCH 1/4] Mailstore abstraction interface
On Tue, 13 Apr 2010 10:53:12 -0700, Carl Worth cwo...@cworth.org wrote: This series is looking like one of the most complete approaches to maildir-flag synchronization, (and I like some of the motivation that leads to notmuch cat). But I think the mailstore abstraction is largely a distraction from the real features here. For my case (of wanting to have backup of my mailstore complete in reasonable time, preferably using less disk space) of wanting mail in git packs, 'notmuch cat' being used everywhere removes a lot of the issues of doing this. (pluggin in an alternative to readdir is fairly simple... but the emacs UI needs to read from it too :) -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
please eat my data!
On Mon, 12 Apr 2010 17:24:35 +0200, "Sebastian Spaeth" wrote: > What I find intersting is that we have a 2x speedup and a 10x speedup > for different queries. Olly was saying on IRC that both *should* really be > behaving in much the same manner. Remember that on ext3 (and pretty sure ext4) fsync is the same as sync(). So performance depends on how much dirty data you have in your cache. libeatmydata also gets rid of msync(), O_SYNC etc as well. -- Stewart Smith
Re: please eat my data!
On Mon, 12 Apr 2010 17:24:35 +0200, Sebastian Spaeth sebast...@sspaeth.de wrote: What I find intersting is that we have a 2x speedup and a 10x speedup for different queries. Olly was saying on IRC that both *should* really be behaving in much the same manner. Remember that on ext3 (and pretty sure ext4) fsync is the same as sync(). So performance depends on how much dirty data you have in your cache. libeatmydata also gets rid of msync(), O_SYNC etc as well. -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[notmuch] Mailstore abstraction & maildir synchronization
On Thu, 18 Mar 2010 16:39:36 +0100, Michal Sojka wrote: > - Only file-based storage is suported. Notmuch access the files > directly, and not via the mailstore interface. It'll be great when this is fixed... should be trivial to add a git backend then. (i have in no way been looking at tags in git though... doesn't really interest me and git aint a database) > - (maildir) Viewing/storing of attachments of unread messages doesn't > work. The reason is that when you view the message it its unread tag > is removed which leads to rename of the file, but Emacs still uses > the original name to access the attachment. What about migrating from a maildir that's turned into notmuch back to this maildir backend? What will be authoritive: maildir or notmuch database? -- Stewart Smith
Re: [notmuch] Mailstore abstraction maildir synchronization
On Thu, 18 Mar 2010 16:39:36 +0100, Michal Sojka sojk...@fel.cvut.cz wrote: - Only file-based storage is suported. Notmuch access the files directly, and not via the mailstore interface. It'll be great when this is fixed... should be trivial to add a git backend then. (i have in no way been looking at tags in git though... doesn't really interest me and git aint a database) - (maildir) Viewing/storing of attachments of unread messages doesn't work. The reason is that when you view the message it its unread tag is removed which leads to rename of the file, but Emacs still uses the original name to access the attachment. What about migrating from a maildir that's turned into notmuch back to this maildir backend? What will be authoritive: maildir or notmuch database? -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[notmuch] [PATCH] A simple approach to maildir flags
On Fri, 26 Feb 2010 14:49:25 -0500, Mike Kelly wrote: > The following patches attempt to provide a simple, extendable approach > to handling the 'Seen' maildir flag. To appease (hopefully) everyone, it > will only do this for new messages. This means that people coming from > another MUA won't be stuck with 30,000 unread messages, for example. > > It should be simple to extend this to other maildir flags, too, if > people want them and can decide on what tags they should correspond to. Personally, I like the seen messages not to be in inbox (by default) as either: 1) I'm importing an old Maildir, in which case if it's read it's probably been dealt with 2) i've used another mail client, same as above. -- Stewart Smith
Re: [notmuch] [PATCH] A simple approach to maildir flags
On Fri, 26 Feb 2010 14:49:25 -0500, Mike Kelly pi...@pioto.org wrote: The following patches attempt to provide a simple, extendable approach to handling the 'Seen' maildir flag. To appease (hopefully) everyone, it will only do this for new messages. This means that people coming from another MUA won't be stuck with 30,000 unread messages, for example. It should be simple to extend this to other maildir flags, too, if people want them and can decide on what tags they should correspond to. Personally, I like the seen messages not to be in inbox (by default) as either: 1) I'm importing an old Maildir, in which case if it's read it's probably been dealt with 2) i've used another mail client, same as above. -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[notmuch] [PATCH] Added mail directory filename pattern support.
On Mon, Feb 22, 2010 at 12:07:31PM -0800, Bart Massey wrote: > Typically, the filenames in a mail directory that actually > contain mail obey some specific format. For example, in my > MH email directory, all mail filenames consist only of > digits. > > This patch adds support for a config file variable > "filename_pattern" which maybe set to a regex used to filter > only valid mail filenames when scanning. Effective use of > filename_pattern cuts down on the noise from notmuch, and > may speed it up in some cases. What about the other way around? e.g. if anybody has ever pointed Evolution at a Maildir, you get a bunch of Maildir-name.ev-summary and .ev-summary-meta and .ibex.index and whatever. A default list of ignored patterns would be pretty easy to come up with. -- Stewart Smith
Re: [notmuch] [PATCH] Added mail directory filename pattern support.
On Mon, Feb 22, 2010 at 12:07:31PM -0800, Bart Massey wrote: Typically, the filenames in a mail directory that actually contain mail obey some specific format. For example, in my MH email directory, all mail filenames consist only of digits. This patch adds support for a config file variable filename_pattern which maybe set to a regex used to filter only valid mail filenames when scanning. Effective use of filename_pattern cuts down on the noise from notmuch, and may speed it up in some cases. What about the other way around? e.g. if anybody has ever pointed Evolution at a Maildir, you get a bunch of Maildir-name.ev-summary and .ev-summary-meta and .ibex.index and whatever. A default list of ignored patterns would be pretty easy to come up with. -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[notmuch] Mail in git
On Wed, 17 Feb 2010 14:21:01 +1300, martin f krafft wrote: > What I am wondering is if (explicit) tags couldn't be represented as > tree-objects with this. > > evenless-link ? link a message object with a tree object > evenless?unlink ? unlink a message object from tree object > [replaces evenless-unlink] I think it could get expensive for tags with lots of messages. With my fast-import script, doing the commit (that referenced... umm.. 800,000+ objects took a *very* long time). As far as I understand it, the tree object is stored in full and space is only reclaimed during repack (due to delta compression). So if you, say, had the entire history of a high volume list such as linux-kernel, adding messages could get rather expensive if you auto-tagged (or autotagged messages with patches or whatever). > messages would then be deleted whenever using git-gc. > > No idea how this would sync if we don't keep ancestry. Otoh, it > would probably not be very expensive to do just that. If we keep ancestry though, we are reusing existing working code for backup (git-pull :) Keep in mind that with my tests, the Maildir in git is about a quarter to a fifth of the size of it in Maildir... so a bit of extra usage per message isn't as dramatic as it may sound. > Is it possible to find out all trees that reference a given object > with Git in constant or sub-linear time? I don't think so but I'm not sure. -- Stewart Smith
[notmuch] Mail in git
On Wed, 17 Feb 2010 11:21:51 +1100, Stewart Smith wrote: > Using fast-import is interesting. Does it update the working tree? The > big thing I wanted to avoid was creating a working tree (another million > inodes being created is not ever what I need) > > Also interesting is the mention of creating packs on the fly... this > could save the time in first writing the object and then packing it (as > my script does). > > I'm going to play with this and I did. good news... on my mailstore (which, as I've previously mentioned, takes about 10 minutes to run 'du' over, about the same time as 'notmuch new' takes): using the (attached) evenless.pl to create a single commit with everything in it: $ du -sh .git 3.4G.git Down from a whopping 14-15GB!!! My previous effort (git-write-object, create pack every 1000 messages, rinse, repeat) took all night and got to 3.7GB. This took only 108 minutes. In both cases, i was creating the repository on another spindle (USB2.0 disk attached to my laptop). git-ls-tree and git-cat-file both work for listing and getting objects. The next thing to think about is adding objects as they come in... creating a new commit with just an added file should be pretty simple and easy... but this means we get to keep a "revision history" of the mailstore, which is *possibly* not ideal in terms of storage efficiency (i'll do a trial with mine of doing one message at a time and seeing what the end size is). however... commit per added mail (or mails) does give us the advantage of a really well documented and tested backup system :) Deleting could be hard.. if we actually want the objects to go away in a "permanent" way (not just no longer be referenced). for the stats nerds: $ time perl /home/stewart/evenless/evenless.pl /home/stewart/Maildir/INBOX git-fast-import statistics: - Alloc'd objects: 785000 Total objects: 781813 ( 79023 duplicates ) blobs : 781363 ( 79023 duplicates 708627 deltas) trees : 449 ( 0 duplicates 0 deltas) commits:1 ( 0 duplicates 0 deltas) tags :0 ( 0 duplicates 0 deltas) Total branches: 1 ( 1 loads ) marks:1048576 (860386 unique) atoms: 860557 Memory total:182780 KiB pools:152116 KiB objects: 30664 KiB - pack_report: getpagesize()= 4096 pack_report: core.packedGitWindowSize = 1073741824 pack_report: core.packedGitLimit = 8589934592 pack_report: pack_used_ctr= 1 pack_report: pack_mmap_calls = 1 pack_report: pack_open_windows= 1 / 1 pack_report: pack_mapped = 388496447 / 388496447 - real107m43.130s user45m25.430s sys 2m49.440s -- next part -- A non-text attachment was scrubbed... Name: evenless.pl Type: text/x-perl Size: 1413 bytes Desc: evenless.pl: maildir to git using fast-import URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100217/bc1a3f34/attachment.pl> ------ next part -- -- Stewart Smith
[notmuch] Mail in git
On Tue, 16 Feb 2010 14:06:29 -0500, Ben Gamari wrote: > Excerpts from Stewart Smith's message of Sun Feb 14 19:29:14 -0500 2010: > > So... I sketched this out in my head at LCA... and it's taken a bit of > > time to actually properly try it. > > > In case anyone wanted to play around with this, I've written up my own > little implementation[1] of a git mail import script. It's quite simple, > but I felt it might be nice to have some public code to play around > with. I get around 80 messages/second on my laptop and things are > definitely quite IO bound. You get 1 commit per message, although I'm > not entirely sure if this is the correct way to do things. > > [1] git://goldnerlab.physics.umass.edu/git-mail Using fast-import is interesting. Does it update the working tree? The big thing I wanted to avoid was creating a working tree (another million inodes being created is not ever what I need) Also interesting is the mention of creating packs on the fly... this could save the time in first writing the object and then packing it (as my script does). I'm going to play with this -- Stewart Smith
Re: [notmuch] Mail in git
On Wed, 17 Feb 2010 11:21:51 +1100, Stewart Smith stew...@flamingspork.com wrote: Using fast-import is interesting. Does it update the working tree? The big thing I wanted to avoid was creating a working tree (another million inodes being created is not ever what I need) Also interesting is the mention of creating packs on the fly... this could save the time in first writing the object and then packing it (as my script does). I'm going to play with this and I did. good news... on my mailstore (which, as I've previously mentioned, takes about 10 minutes to run 'du' over, about the same time as 'notmuch new' takes): using the (attached) evenless.pl to create a single commit with everything in it: $ du -sh .git 3.4G.git Down from a whopping 14-15GB!!! My previous effort (git-write-object, create pack every 1000 messages, rinse, repeat) took all night and got to 3.7GB. This took only 108 minutes. In both cases, i was creating the repository on another spindle (USB2.0 disk attached to my laptop). git-ls-tree and git-cat-file both work for listing and getting objects. The next thing to think about is adding objects as they come in... creating a new commit with just an added file should be pretty simple and easy... but this means we get to keep a revision history of the mailstore, which is *possibly* not ideal in terms of storage efficiency (i'll do a trial with mine of doing one message at a time and seeing what the end size is). however... commit per added mail (or mails) does give us the advantage of a really well documented and tested backup system :) Deleting could be hard.. if we actually want the objects to go away in a permanent way (not just no longer be referenced). for the stats nerds: $ time perl /home/stewart/evenless/evenless.pl /home/stewart/Maildir/INBOX git-fast-import statistics: - Alloc'd objects: 785000 Total objects: 781813 ( 79023 duplicates ) blobs : 781363 ( 79023 duplicates 708627 deltas) trees : 449 ( 0 duplicates 0 deltas) commits:1 ( 0 duplicates 0 deltas) tags :0 ( 0 duplicates 0 deltas) Total branches: 1 ( 1 loads ) marks:1048576 (860386 unique) atoms: 860557 Memory total:182780 KiB pools:152116 KiB objects: 30664 KiB - pack_report: getpagesize()= 4096 pack_report: core.packedGitWindowSize = 1073741824 pack_report: core.packedGitLimit = 8589934592 pack_report: pack_used_ctr= 1 pack_report: pack_mmap_calls = 1 pack_report: pack_open_windows= 1 / 1 pack_report: pack_mapped = 388496447 / 388496447 - real107m43.130s user45m25.430s sys 2m49.440s #!/usr/bin/perl -w use strict; my $tree= ; use IPC::Open2; use File::stat; my $FILES; my $mark= 1; my $stripdir= $ARGV[0]; sub fastimport_blobs ($); sub fastimport_blobs ($) { my $dirname= shift @_; opendir (my $dirhandle, $dirname); foreach (readdir $dirhandle) { next if /^\.\.?$/; next if /\.cmeta$/; next if /\.ibex.index$/; next if /\.ibex.index.data$/; next if /\.ev-summary$/; next if /\.ev-summary-meta$/; next if /\.notmuch$/; if (-d $dirname.'/'.$_) { print STDERR Recursing into $_/ ; fastimport_blobs($dirname.'/'.$_); print STDERR \n; } else { my $sb= stat($dirname/$_); print FASTIMPORT blob\n; print FASTIMPORT mark :$mark\n; print FASTIMPORT data .($sb-size).\n; open FILEIN, $dirname/$_; my $content; sysread FILEIN, $content, $sb-size; close FILEIN; print FASTIMPORT $content; my $storedir= $dirname/$_; $storedir=~ s/^$stripdir//; $storedir=~ s/^\///; $FILES.=M 0644 :$mark $storedir\n; $mark++; } } } open FASTIMPORT, | git fast-import --date-format=rfc2822; fastimport_blobs($ARGV[0]); print FASTIMPORT commit refs/heads/master\n; print FASTIMPORT committer EvenLess evenle...@evenless .`date -R`; print FASTIMPORT data 11\n; print FASTIMPORT mail commit\n; print FASTIMPORT $FILES; print FASTIMPORT \n; close FASTIMPORT; -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] Mail in git
On Wed, 17 Feb 2010 14:21:01 +1300, martin f krafft madd...@madduck.net wrote: What I am wondering is if (explicit) tags couldn't be represented as tree-objects with this. evenless-link — link a message object with a tree object evenless–unlink – unlink a message object from tree object [replaces evenless-unlink] I think it could get expensive for tags with lots of messages. With my fast-import script, doing the commit (that referenced... umm.. 800,000+ objects took a *very* long time). As far as I understand it, the tree object is stored in full and space is only reclaimed during repack (due to delta compression). So if you, say, had the entire history of a high volume list such as linux-kernel, adding messages could get rather expensive if you auto-tagged (or autotagged messages with patches or whatever). messages would then be deleted whenever using git-gc. No idea how this would sync if we don't keep ancestry. Otoh, it would probably not be very expensive to do just that. If we keep ancestry though, we are reusing existing working code for backup (git-pull :) Keep in mind that with my tests, the Maildir in git is about a quarter to a fifth of the size of it in Maildir... so a bit of extra usage per message isn't as dramatic as it may sound. Is it possible to find out all trees that reference a given object with Git in constant or sub-linear time? I don't think so but I'm not sure. -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[notmuch] Notmuch performance problems on OSX
On Fri, 15 Jan 2010 03:58:50 + (UTC), Olly Betts wrote: > One difference between OS X and other systems is that OS X supports the > F_FULLSYNC ioctl, and other systems don't (currently, at least AFAIK) > and Xapian uses that if it is available to ensure that changes have > actually made it to disk: > > http://trac.xapian.org/ticket/288 > > On other systems, it uses fdatasync() or fsync(), which typically just > ensure that the data has left the OS - it can sit in disk controller or > drive caches for potentially seconds longer. This call happens once > per table for every (explicit or implicit) flush on a database. At least if you OS and file system don't hate you (e.g. XFS on Linux), then fsync() really does flush the drive cache. Also keep in mind that the OSX file system (HFS+) was great for 1985. It's essentially single threaded :/ -- Stewart Smith
[notmuch] [PATCH] notmuch: Respect maildir message flags
New patch that does it. Pretty much same as the old one, just with that one bug I mentioned fixed. This is what I've currently used to import my Maildir. I'm now happy :) diff --git a/notmuch-new.c b/notmuch-new.c index f25c71f..43371a3 100644 --- a/notmuch-new.c +++ b/notmuch-new.c @@ -39,6 +39,7 @@ typedef struct { int total_files; int processed_files; int added_messages; +int tag_maildir; struct timeval tv_start; _filename_list_t *removed_files; @@ -169,6 +170,60 @@ _entries_resemble_maildir (struct dirent **entries, int count) return 0; } +/* Tag new mail according to its Maildir attribute flags. + * + * Test if the mail file's filename contains any of the + * standard Maildir attributes, and translate these to + * the corresponding standard notmuch tags. + * + * If the message is not marked as 'seen', or if no + * flags are present, tag as 'inbox, unread'. + */ +static void +derive_tags_from_maildir_flags (notmuch_message_t *message, + const char * path) +{ +int seen = FALSE; +int end_of_flags = FALSE; +size_t l = strlen(path); + +/* Non-experimental message flags start with this */ +char * i = strstr(path, ":2,"); +i = (i) ? i : strstr(path, "!2,"); /* This format is used on VFAT */ +if (i != NULL) { + i += 3; + for (; i < (path + l) && !end_of_flags; i++) { + switch (*i) { + case 'F' : + notmuch_message_add_tag (message, "flagged"); + break; + case 'R': /* replied */ + notmuch_message_add_tag (message, "answered"); + break; + case 'D': + notmuch_message_add_tag (message, "draft"); + break; + case 'S': /* seen */ + seen = TRUE; + break; + case 'T': /* trashed */ + notmuch_message_add_tag (message, "deleted"); + break; + case 'P': /* passed */ + notmuch_message_add_tag (message, "forwarded"); + break; + default: + end_of_flags = TRUE; + break; + } + } +} + +if (i == NULL || !seen) { + tag_inbox_and_unread (message); +} +} + /* Examine 'path' recursively as follows: * * o Ask the filesystem for the mtime of 'path' (fs_mtime) @@ -222,6 +277,7 @@ add_files_recursive (notmuch_database_t *notmuch, notmuch_filenames_t *db_subdirs = NULL; struct stat st; notmuch_bool_t is_maildir, new_directory; +int maildir_detected = -1; if (stat (path, )) { fprintf (stderr, "Error reading directory %s: %s\n", @@ -301,6 +357,28 @@ add_files_recursive (notmuch_database_t *notmuch, continue; } + /* If this directory is a Maildir folder, we need to +* ignore any subdirectories marked tmp/, and scan for +* Maildir attributes on messages contained in the sub- +* directories 'new' and 'cur'. */ + if (maildir_detected != 0 && + (entry->d_type == DT_DIR || entry->d_type == DT_UNKNOWN) && + ((strcmp (entry->d_name, "tmp") == 0) || +(strcmp (entry->d_name, "new") == 0) || +(strcmp (entry->d_name, "cur") == 0))) { + +if (maildir_detected == -1) { + maildir_detected = _entries_resemble_maildir(fs_entries, num_fs_entries); +} +if (maildir_detected == 1) { + if (strcmp (entry->d_name, "tmp") == 0) { +continue; + } else { +state->tag_maildir = TRUE; + } +} + } + next = talloc_asprintf (notmuch, "%s/%s", path, entry->d_name); status = add_files_recursive (notmuch, next, state); if (status && ret == NOTMUCH_STATUS_SUCCESS) @@ -412,7 +490,12 @@ add_files_recursive (notmuch_database_t *notmuch, /* success */ case NOTMUCH_STATUS_SUCCESS: state->added_messages++; - tag_inbox_and_unread (message); + if (state->tag_maildir) { + derive_tags_from_maildir_flags (message, + entry->d_name); + } else { + tag_inbox_and_unread (message); + } break; /* Non-fatal issues (go on to next file) */ case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID: -- Stewart Smith
[notmuch] [PATCH] notmuch: Respect maildir message flags
On Wed, Feb 10, 2010 at 01:43:39PM +1030, Tim Stoakes wrote: > My apologies for dredging up an old thread. I don't want to restart the > religious war over whether notmuch should respect Maildir flags - > suffice to say that *I* want that, and the patch posted by Michiel > seemed to be the best way to make that happen. I want this too :) I also found a bug > @@ -301,6 +357,28 @@ add_files_recursive (notmuch_database_t *notmuch, > continue; > } > > + /* If this directory is a Maildir folder, we need to > + * ignore any subdirectories marked tmp/, and scan for > + * Maildir attributes on messages contained in the sub- > + * directories 'new' and 'cur'. */ > + if (maildir_detected != 0 && > + entry->d_type == DT_DIR && > + ((strcmp (entry->d_name, "tmp") == 0) || > + (strcmp (entry->d_name, "new") == 0) || > + (strcmp (entry->d_name, "cur") == 0))) { should be (entry->d_type == DT_DIR || entry->d_type == DT_UNKNOWN) && as not everywhere is going to give you d_type (e.g. my machine). (took me a while to find/figure that out :) -- Stewart Smith
[notmuch] Git as notmuch object store (was: Potential problem using Git for mail)
On Mon, Jan 25, 2010 at 01:46:59PM +1300, martin f krafft wrote: > Stewart, you've worked most on this so far. Would you like to share > your thoughts? Just posted a new thread with my latest experiments. Things look rather good from a storage size point of view. Still a few things to work out though. -- Stewart Smith
[notmuch] Mail in git
So... I sketched this out in my head at LCA... and it's taken a bit of time to actually properly try it. The problem is: A simple 'find ~/Maildir` takes 10 minutes, and if you write the output to a file, it's 88MB+ there's "only" about 900,000 entries there. But this means 900,000 files, which is a non-trivial amount. Some mail folders are quite large too. Some of this problem could just be solved by using notmuch a bit differently (folder per month for example). However... this is a one-way change and going back would be very tricky. There's also the backup problem. Iterating through ~1million inodes takes a *LONG* time. Restoring it takes even longer (think about writing all that data to the file system journal). Historically, if i'm running a backup, I couldn't really use my laptop, it'd be saturated with disk IO performing the file system dump. It would also take many hours. Restoring from backup? about 8hrs. An observation is that mail never changes. It may be reclassified (and that's what notmuch is for), but it never changes. We really just want a way to store and access many many many small blobs of data that never change. It turns out git is pretty good at that. Underneath, we could just use it as an object store (a simple git-hash-object and git-cat-file test confirmed this to be pretty simple to do). even better is since a lot of mail is fairly similar, to use delta compression between mail messages to reduce the storage space. Git is pretty good at that too. A few giant git packs will be much quicker to backup and restore than 1million files. So... I wrote a script to test it $ time perl /home/stewart/evenless.pl /home/stewart/Maildir/ real841m41.491s user491m3.200s sys 261m58.080s Which goes from a 15GB Maildir to a 3.7GB git repo. The algorithm of evenless.pl is basically: 1 get next directory entry 2 if is directory, recurse into it 3 write item to git (git hash-object -w) 4 add item to tree object 5 if number of items written = 1000 5.1 make pack of last 1000 items 6 goto 1 $ git count-objects -v count: 479 size: 27680 in-pack: 873109 packs: 1084 size-pack: 3746219 prune-packable: 0 garbage: 0 If i did a "git checkout", about 8 hours later i'd have a directory tree exactly the same as my maildir. Why didn't I just git-add everything? I didn't exactly feel like creating another giant copy of my mail (that also takes a long time). What about adding more mail to the archive? So the way I think is that you use a Maildir for day to day mail (e.g. delivery) and every so often you run some magic command that takes old mail out of the Maildir and stores it in the git repo. Next step? Make notmuch be able to read mail out of it and add it to an index (oh, and some kind of verification and error checking about creating the git repo). -- Stewart Smith
Re: [notmuch] [PATCH] notmuch: Respect maildir message flags
On Wed, Feb 10, 2010 at 01:43:39PM +1030, Tim Stoakes wrote: My apologies for dredging up an old thread. I don't want to restart the religious war over whether notmuch should respect Maildir flags - suffice to say that *I* want that, and the patch posted by Michiel seemed to be the best way to make that happen. I want this too :) I also found a bug @@ -301,6 +357,28 @@ add_files_recursive (notmuch_database_t *notmuch, continue; } + /* If this directory is a Maildir folder, we need to + * ignore any subdirectories marked tmp/, and scan for + * Maildir attributes on messages contained in the sub- + * directories 'new' and 'cur'. */ + if (maildir_detected != 0 + entry-d_type == DT_DIR + ((strcmp (entry-d_name, tmp) == 0) || + (strcmp (entry-d_name, new) == 0) || + (strcmp (entry-d_name, cur) == 0))) { should be (entry-d_type == DT_DIR || entry-d_type == DT_UNKNOWN) as not everywhere is going to give you d_type (e.g. my machine). (took me a while to find/figure that out :) -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[notmuch] [PATCH] notmuch: Respect maildir message flags
New patch that does it. Pretty much same as the old one, just with that one bug I mentioned fixed. This is what I've currently used to import my Maildir. I'm now happy :) diff --git a/notmuch-new.c b/notmuch-new.c index f25c71f..43371a3 100644 --- a/notmuch-new.c +++ b/notmuch-new.c @@ -39,6 +39,7 @@ typedef struct { int total_files; int processed_files; int added_messages; +int tag_maildir; struct timeval tv_start; _filename_list_t *removed_files; @@ -169,6 +170,60 @@ _entries_resemble_maildir (struct dirent **entries, int count) return 0; } +/* Tag new mail according to its Maildir attribute flags. + * + * Test if the mail file's filename contains any of the + * standard Maildir attributes, and translate these to + * the corresponding standard notmuch tags. + * + * If the message is not marked as 'seen', or if no + * flags are present, tag as 'inbox, unread'. + */ +static void +derive_tags_from_maildir_flags (notmuch_message_t *message, + const char * path) +{ +int seen = FALSE; +int end_of_flags = FALSE; +size_t l = strlen(path); + +/* Non-experimental message flags start with this */ +char * i = strstr(path, :2,); +i = (i) ? i : strstr(path, !2,); /* This format is used on VFAT */ +if (i != NULL) { + i += 3; + for (; i (path + l) !end_of_flags; i++) { + switch (*i) { + case 'F' : + notmuch_message_add_tag (message, flagged); + break; + case 'R': /* replied */ + notmuch_message_add_tag (message, answered); + break; + case 'D': + notmuch_message_add_tag (message, draft); + break; + case 'S': /* seen */ + seen = TRUE; + break; + case 'T': /* trashed */ + notmuch_message_add_tag (message, deleted); + break; + case 'P': /* passed */ + notmuch_message_add_tag (message, forwarded); + break; + default: + end_of_flags = TRUE; + break; + } + } +} + +if (i == NULL || !seen) { + tag_inbox_and_unread (message); +} +} + /* Examine 'path' recursively as follows: * * o Ask the filesystem for the mtime of 'path' (fs_mtime) @@ -222,6 +277,7 @@ add_files_recursive (notmuch_database_t *notmuch, notmuch_filenames_t *db_subdirs = NULL; struct stat st; notmuch_bool_t is_maildir, new_directory; +int maildir_detected = -1; if (stat (path, st)) { fprintf (stderr, Error reading directory %s: %s\n, @@ -301,6 +357,28 @@ add_files_recursive (notmuch_database_t *notmuch, continue; } + /* If this directory is a Maildir folder, we need to +* ignore any subdirectories marked tmp/, and scan for +* Maildir attributes on messages contained in the sub- +* directories 'new' and 'cur'. */ + if (maildir_detected != 0 + (entry-d_type == DT_DIR || entry-d_type == DT_UNKNOWN) + ((strcmp (entry-d_name, tmp) == 0) || +(strcmp (entry-d_name, new) == 0) || +(strcmp (entry-d_name, cur) == 0))) { + +if (maildir_detected == -1) { + maildir_detected = _entries_resemble_maildir(fs_entries, num_fs_entries); +} +if (maildir_detected == 1) { + if (strcmp (entry-d_name, tmp) == 0) { +continue; + } else { +state-tag_maildir = TRUE; + } +} + } + next = talloc_asprintf (notmuch, %s/%s, path, entry-d_name); status = add_files_recursive (notmuch, next, state); if (status ret == NOTMUCH_STATUS_SUCCESS) @@ -412,7 +490,12 @@ add_files_recursive (notmuch_database_t *notmuch, /* success */ case NOTMUCH_STATUS_SUCCESS: state-added_messages++; - tag_inbox_and_unread (message); + if (state-tag_maildir) { + derive_tags_from_maildir_flags (message, + entry-d_name); + } else { + tag_inbox_and_unread (message); + } break; /* Non-fatal issues (go on to next file) */ case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID: -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] [PATCH] notmuch: Respect maildir message flags
On Tue, Feb 16, 2010 at 03:12:50PM +1300, martin f krafft wrote: also sprach Stewart Smith stew...@flamingspork.com [2010.02.16.1458 +1300]: + case 'R': /* replied */ + notmuch_message_add_tag (message, answered); + break; 'r' means replied, not 'answered'. fixed. (i have to admit... i didn't look too closely at this... it just worked enough for me) + case 'T': /* trashed */ + notmuch_message_add_tag (message, deleted); + break; Same. trashed and deleted are not the same thing. changed to 'trashed'. I don't want to get into an argument over this, because I think this already exposes a problem: you are putting into global namespace something not everyone might want, or agree with. Why not use 'maildirflags::replied' instead? People can always map that to something in the global namespace. What about putting them all in there except for the seen tag, with the seen tag dictating if it gets marked 'unread' or not? I cannot imagine where somebody would want this not to be the case... it was bad enough discovering 100,000 unread messages :) What about this patch (just with those few things fixed)? diff --git a/notmuch-new.c b/notmuch-new.c index f25c71f..8303047 100644 --- a/notmuch-new.c +++ b/notmuch-new.c @@ -39,6 +39,7 @@ typedef struct { int total_files; int processed_files; int added_messages; +int tag_maildir; struct timeval tv_start; _filename_list_t *removed_files; @@ -169,6 +170,60 @@ _entries_resemble_maildir (struct dirent **entries, int count) return 0; } +/* Tag new mail according to its Maildir attribute flags. + * + * Test if the mail file's filename contains any of the + * standard Maildir attributes, and translate these to + * the corresponding standard notmuch tags. + * + * If the message is not marked as 'seen', or if no + * flags are present, tag as 'inbox, unread'. + */ +static void +derive_tags_from_maildir_flags (notmuch_message_t *message, + const char * path) +{ +int seen = FALSE; +int end_of_flags = FALSE; +size_t l = strlen(path); + +/* Non-experimental message flags start with this */ +char * i = strstr(path, :2,); +i = (i) ? i : strstr(path, !2,); /* This format is used on VFAT */ +if (i != NULL) { + i += 3; + for (; i (path + l) !end_of_flags; i++) { + switch (*i) { + case 'F' : + notmuch_message_add_tag (message, maildir::flagged); + break; + case 'R': /* replied */ + notmuch_message_add_tag (message, maildir::replied); + break; + case 'D': + notmuch_message_add_tag (message, maildir::draft); + break; + case 'S': /* seen */ + seen = TRUE; + break; + case 'T': /* trashed */ + notmuch_message_add_tag (message, maildir::trashed); + break; + case 'P': /* passed */ + notmuch_message_add_tag (message, maildir::forwarded); + break; + default: + end_of_flags = TRUE; + break; + } + } +} + +if (i == NULL || !seen) { + tag_inbox_and_unread (message); +} +} + /* Examine 'path' recursively as follows: * * o Ask the filesystem for the mtime of 'path' (fs_mtime) @@ -222,6 +277,7 @@ add_files_recursive (notmuch_database_t *notmuch, notmuch_filenames_t *db_subdirs = NULL; struct stat st; notmuch_bool_t is_maildir, new_directory; +int maildir_detected = -1; if (stat (path, st)) { fprintf (stderr, Error reading directory %s: %s\n, @@ -301,6 +357,28 @@ add_files_recursive (notmuch_database_t *notmuch, continue; } + /* If this directory is a Maildir folder, we need to +* ignore any subdirectories marked tmp/, and scan for +* Maildir attributes on messages contained in the sub- +* directories 'new' and 'cur'. */ + if (maildir_detected != 0 + (entry-d_type == DT_DIR || entry-d_type == DT_UNKNOWN) + ((strcmp (entry-d_name, tmp) == 0) || +(strcmp (entry-d_name, new) == 0) || +(strcmp (entry-d_name, cur) == 0))) { + +if (maildir_detected == -1) { + maildir_detected = _entries_resemble_maildir(fs_entries, num_fs_entries); +} +if (maildir_detected == 1) { + if (strcmp (entry-d_name, tmp) == 0) { +continue; + } else { +state-tag_maildir = TRUE; + } +} + } + next = talloc_asprintf (notmuch, %s/%s, path, entry-d_name); status = add_files_recursive (notmuch, next, state); if (status ret == NOTMUCH_STATUS_SUCCESS) @@ -412,7 +490,12 @@ add_files_recursive (notmuch_database_t *notmuch, /* success */ case NOTMUCH_STATUS_SUCCESS: state-added_messages
Re: [notmuch] Notmuch performance problems on OSX
On Fri, 15 Jan 2010 03:58:50 + (UTC), Olly Betts o...@survex.com wrote: One difference between OS X and other systems is that OS X supports the F_FULLSYNC ioctl, and other systems don't (currently, at least AFAIK) and Xapian uses that if it is available to ensure that changes have actually made it to disk: http://trac.xapian.org/ticket/288 On other systems, it uses fdatasync() or fsync(), which typically just ensure that the data has left the OS - it can sit in disk controller or drive caches for potentially seconds longer. This call happens once per table for every (explicit or implicit) flush on a database. At least if you OS and file system don't hate you (e.g. XFS on Linux), then fsync() really does flush the drive cache. Also keep in mind that the OSX file system (HFS+) was great for 1985. It's essentially single threaded :/ -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[notmuch] Mail in git
So... I sketched this out in my head at LCA... and it's taken a bit of time to actually properly try it. The problem is: A simple 'find ~/Maildir` takes 10 minutes, and if you write the output to a file, it's 88MB+ there's only about 900,000 entries there. But this means 900,000 files, which is a non-trivial amount. Some mail folders are quite large too. Some of this problem could just be solved by using notmuch a bit differently (folder per month for example). However... this is a one-way change and going back would be very tricky. There's also the backup problem. Iterating through ~1million inodes takes a *LONG* time. Restoring it takes even longer (think about writing all that data to the file system journal). Historically, if i'm running a backup, I couldn't really use my laptop, it'd be saturated with disk IO performing the file system dump. It would also take many hours. Restoring from backup? about 8hrs. An observation is that mail never changes. It may be reclassified (and that's what notmuch is for), but it never changes. We really just want a way to store and access many many many small blobs of data that never change. It turns out git is pretty good at that. Underneath, we could just use it as an object store (a simple git-hash-object and git-cat-file test confirmed this to be pretty simple to do). even better is since a lot of mail is fairly similar, to use delta compression between mail messages to reduce the storage space. Git is pretty good at that too. A few giant git packs will be much quicker to backup and restore than 1million files. So... I wrote a script to test it $ time perl /home/stewart/evenless.pl /home/stewart/Maildir/ real841m41.491s user491m3.200s sys 261m58.080s Which goes from a 15GB Maildir to a 3.7GB git repo. The algorithm of evenless.pl is basically: 1 get next directory entry 2 if is directory, recurse into it 3 write item to git (git hash-object -w) 4 add item to tree object 5 if number of items written = 1000 5.1 make pack of last 1000 items 6 goto 1 $ git count-objects -v count: 479 size: 27680 in-pack: 873109 packs: 1084 size-pack: 3746219 prune-packable: 0 garbage: 0 If i did a git checkout, about 8 hours later i'd have a directory tree exactly the same as my maildir. Why didn't I just git-add everything? I didn't exactly feel like creating another giant copy of my mail (that also takes a long time). What about adding more mail to the archive? So the way I think is that you use a Maildir for day to day mail (e.g. delivery) and every so often you run some magic command that takes old mail out of the Maildir and stores it in the git repo. Next step? Make notmuch be able to read mail out of it and add it to an index (oh, and some kind of verification and error checking about creating the git repo). -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] Git as notmuch object store (was: Potential problem using Git for mail)
On Mon, Jan 25, 2010 at 01:46:59PM +1300, martin f krafft wrote: Stewart, you've worked most on this so far. Would you like to share your thoughts? Just posted a new thread with my latest experiments. Things look rather good from a storage size point of view. Still a few things to work out though. -- Stewart Smith ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[notmuch] Mac OS X/Darwin compatibility issues
On Wed, Nov 18, 2009 at 04:24:42PM -0800, Alexander Botero-Lowry wrote: > On Thu, 19 Nov 2009 10:45:28 +1100, Stewart Smith flamingspork.com> wrote: > > On Wed, Nov 18, 2009 at 11:27:20PM +0100, Carl Worth wrote: > > > Yes. I knew I was "cheating" by using some GNU extensions here. I'm > > > happy to accept portability patches for these things, but it's hard for > > > me to get excited about writing them myself. > > > > > > Care to take a whack at these? > > > > http://www.gnu.org/software/gnulib/ > > > > could be a partial answer. > > > Why add yet another dependency for a couple of functions? Especially > considering how notmuch already depends on glib which includes portability > functions for various things. The idea with gnulib (at least what we've done with drizzle) is to just copy the bits you need into the tree. Does work pretty well for those small things that you just don't need to depend on a giant like glib for. -- Stewart Smith
[notmuch] Mac OS X/Darwin compatibility issues
On Wed, Nov 18, 2009 at 11:27:20PM +0100, Carl Worth wrote: > Yes. I knew I was "cheating" by using some GNU extensions here. I'm > happy to accept portability patches for these things, but it's hard for > me to get excited about writing them myself. > > Care to take a whack at these? http://www.gnu.org/software/gnulib/ could be a partial answer. We've taken to using it where needed for Drizzle and seems to work fine. -- Stewart Smith
[notmuch] [PATCH] count_files: sort directory in inode order before statting
--- notmuch-new.c | 30 ++ 1 files changed, 10 insertions(+), 20 deletions(-) diff --git a/notmuch-new.c b/notmuch-new.c index 11fad8c..c5f841a 100644 --- a/notmuch-new.c +++ b/notmuch-new.c @@ -308,36 +308,26 @@ add_files (notmuch_database_t *notmuch, static void count_files (const char *path, int *count) { -DIR *dir; -struct dirent *e, *entry = NULL; -int entry_length; -int err; +struct dirent *entry = NULL; char *next; struct stat st; +struct dirent **namelist = NULL; -dir = opendir (path); +int n_entries= scandir(path, , 0, ino_cmp); -if (dir == NULL) { +if (n_entries == -1) { fprintf (stderr, "Warning: failed to open directory %s: %s\n", path, strerror (errno)); goto DONE; } -entry_length = offsetof (struct dirent, d_name) + - pathconf (path, _PC_NAME_MAX) + 1; -entry = malloc (entry_length); +int i=0; while (!interrupted) { - err = readdir_r (dir, entry, ); - if (err) { - fprintf (stderr, "Error reading directory: %s\n", -strerror (errno)); - free (entry); - goto DONE; - } +if (i == n_entries) +break; - if (e == NULL) - break; +entry= namelist[i++]; /* Ignore special directories to avoid infinite recursion. * Also ignore the .notmuch directory. @@ -376,8 +366,8 @@ count_files (const char *path, int *count) DONE: if (entry) free (entry); - -closedir (dir); +if (namelist) +free (namelist); } int -- 1.6.3.3
[notmuch] [PATCH 2/2] Read mail directory in inode number order
This gives a rather decent reduction in number of seeks required when reading a Maildir that isn't in pagecache. Most filesystems give some locality on disk based on inode numbers. In ext[234] this is the inode tables, in XFS groups of sequential inode numbers are together on disk and the most significant bits indicate allocation group (i.e inode 1,000,000 is always after inode 1,000). With this patch, we read in the whole directory, sort by inode number before stat()ing the contents. Ideally, directory is sequential and then we make one scan through the file system stat()ing. Since the universe is not ideal, we'll probably seek during reading the directory and a fair bit while reading the inodes themselves. However... with readahead, and stat()ing in inode order, we should be in the best place possible to hit the cache. In a (not very good) benchmark of "how long does it take to find the first 15,000 messages in my Maildir after 'echo 3 > /proc/sys/vm/drop_caches'", this patch consistently cut at least 8 seconds off the scan time. Without patch: 50 seconds With patch: 38-42 seconds. (I did this in a previous maildir reading project and saw large improvements too) --- notmuch-new.c | 32 +++- 1 files changed, 15 insertions(+), 17 deletions(-) diff --git a/notmuch-new.c b/notmuch-new.c index 83a05ba..11fad8c 100644 --- a/notmuch-new.c +++ b/notmuch-new.c @@ -73,6 +73,11 @@ add_files_print_progress (add_files_state_t *state) fflush (stdout); } +static int ino_cmp(const struct dirent **a, const struct dirent **b) +{ + return ((*a)->d_ino < (*b)->d_ino)? -1: 1; +} + /* Examine 'path' recursively as follows: * * o Ask the filesystem for the mtime of 'path' (path_mtime) @@ -100,13 +105,12 @@ add_files_recursive (notmuch_database_t *notmuch, add_files_state_t *state) { DIR *dir = NULL; -struct dirent *e, *entry = NULL; -int entry_length; -int err; +struct dirent *entry = NULL; char *next = NULL; time_t path_mtime, path_dbtime; notmuch_status_t status, ret = NOTMUCH_STATUS_SUCCESS; notmuch_message_t *message = NULL; +struct dirent **namelist = NULL; /* If we're told to, we bail out on encountering a read-only * directory, (with this being a clear clue from the user to @@ -122,31 +126,23 @@ add_files_recursive (notmuch_database_t *notmuch, path_mtime = st->st_mtime; path_dbtime = notmuch_database_get_timestamp (notmuch, path); +int n_entries= scandir(path, , 0, ino_cmp); -dir = opendir (path); -if (dir == NULL) { +if (n_entries == -1) { fprintf (stderr, "Error opening directory %s: %s\n", path, strerror (errno)); ret = NOTMUCH_STATUS_FILE_ERROR; goto DONE; } -entry_length = offsetof (struct dirent, d_name) + - pathconf (path, _PC_NAME_MAX) + 1; -entry = malloc (entry_length); +int i=0; while (!interrupted) { - err = readdir_r (dir, entry, ); - if (err) { - fprintf (stderr, "Error reading directory: %s\n", -strerror (errno)); - ret = NOTMUCH_STATUS_FILE_ERROR; - goto DONE; - } - - if (e == NULL) + if (i == n_entries) break; +entry= namelist[i++]; + /* If this directory hasn't been modified since the last * add_files, then we only need to look further for * sub-directories. */ @@ -243,6 +239,8 @@ add_files_recursive (notmuch_database_t *notmuch, free (entry); if (dir) closedir (dir); +if (namelist) + free (namelist); return ret; } -- 1.6.3.3
[notmuch] [PATCH] Fix linking with gcc to use g++ to link in C++ libs.
Previously, Ubuntu 9.10, gcc 4.4.1 was getting: ccache gcc `pkg-config --libs glib-2.0 gmime-2.4 talloc` `xapian-config --libs` notmuch.o notmuch-config.o notmuch-dump.o notmuch-new.o notmuch-reply.o notmuch-restore.o notmuch-search.o notmuch-setup.o notmuch-show.o notmuch-tag.o notmuch-time.o gmime-filter-reply.o query-string.o show-message.o lib/notmuch.a -o notmuch /usr/bin/ld: lib/notmuch.a(database.o): in function global constructors keyed to BOOLEAN_PREFIX_INTERNAL:database.cc(.text+0x3a): error: undefined reference to 'std::ios_base::Init::Init()' --- Makefile.local |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Makefile.local b/Makefile.local index f824bed..dbd3e20 100644 --- a/Makefile.local +++ b/Makefile.local @@ -18,7 +18,7 @@ notmuch_client_srcs = \ notmuch_client_modules = $(notmuch_client_srcs:.c=.o) notmuch: $(notmuch_client_modules) lib/notmuch.a - $(CC) $(LDFLAGS) $^ -o $@ + $(CXX) $(LDFLAGS) $^ -o $@ notmuch.1.gz: gzip --stdout notmuch.1 > notmuch.1.gz -- 1.6.3.3