WARNING: database upgrade coming

2014-03-18 Thread Stewart Smith
Tomi Ollila  writes:
> Some ideas to bikeshed with:
>
> "The database upgrade is done in a new database; at the end of the updrade
> the current database is replaced with the new one -- Interrupting updrade
> (with Ctrl-C) leaves you with the current database."

In a condition where free space on filesystem is less than size of
database... things could get interesting, right? At the very least it's
probably not worth even attempting the upgrade unless there's a --force
or something.

-- 
Stewart Smith
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20140318/b6fb6abe/attachment-0001.pgp>


Re: WARNING: database upgrade coming

2014-03-18 Thread Stewart Smith
Tomi Ollila tomi.oll...@iki.fi writes:
 Some ideas to bikeshed with:

 The database upgrade is done in a new database; at the end of the updrade
 the current database is replaced with the new one -- Interrupting updrade
 (with Ctrl-C) leaves you with the current database.

In a condition where free space on filesystem is less than size of
database... things could get interesting, right? At the very least it's
probably not worth even attempting the upgrade unless there's a --force
or something.

-- 
Stewart Smith


pgp3hCMHvxeBS.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Alternative (raw) message store (i.e. instead of maildir)

2012-08-15 Thread Stewart Smith
Vladimir Marek  writes:
> Well, if your granularity will be one archive per year of mail, it
> should not be that bad ...

Except for someone like Keith, who has all his email since sometime in
the 80s or something insane like that :)

-- 
Stewart Smith
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20120815/f4be45e4/attachment.pgp>


Alternative (raw) message store (i.e. instead of maildir)

2012-08-14 Thread Stewart Smith
Vladimir Marek  writes:
> Hi,
>
> I have objections against maildir too, but I tried to tackle it from
> different perspective. Store the maildir in zip file and use fuse-zip to
> manage it. It works sort of but it has two major disadvantages:

huh... this is fairly interesting one of the downsides of a million
odd files for mail is that filesystem dump and restore takes a *LOT*
longer than if it's just giant files on disk. Combined with afuse (fuse
automounter) this could be a pretty elegant solution to the problem of
storing archival Maildirs.

One large archival maildir here went from 6.5GB (du -sh on XFS) to a
2.3GB ZIP archive that will never, ever change. Think about the
performance difference between creating 560,000 files for backup/restore
versus copying a single 2.3GB file.

>  - fuse zip stores all changes in memory until unmounted
>  - fuse zip (and libzip for that matter) creates new temporary file when
>updating archive, which takes considerable time when the archive is
>very big.

This isn't much of a hastle if you have maildir per time period and
archive off. Maybe if you sync flags it may be...

> Of course this solution would have some disadvantages too, but for me
> the advantages would win. At the moment I'm not sure if I want to
> continue working on that. Maybe if there would be more interested guys

I'm *really* tempted to investigate making this work for archived
mail. Of course, the list of mounted file systems could get insane
depending on granularity I guess...

-- 
Stewart Smith
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20120814/cf027a1b/attachment.pgp>


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-14 Thread Stewart Smith
Vladimir Marek vladimir.ma...@oracle.com writes:
 Well, if your granularity will be one archive per year of mail, it
 should not be that bad ...

Except for someone like Keith, who has all his email since sometime in
the 80s or something insane like that :)

-- 
Stewart Smith


pgpqbDWUxd3Kw.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-13 Thread Stewart Smith
Vladimir Marek vladimir.ma...@oracle.com writes:
 Hi,

 I have objections against maildir too, but I tried to tackle it from
 different perspective. Store the maildir in zip file and use fuse-zip to
 manage it. It works sort of but it has two major disadvantages:

huh... this is fairly interesting one of the downsides of a million
odd files for mail is that filesystem dump and restore takes a *LOT*
longer than if it's just giant files on disk. Combined with afuse (fuse
automounter) this could be a pretty elegant solution to the problem of
storing archival Maildirs.

One large archival maildir here went from 6.5GB (du -sh on XFS) to a
2.3GB ZIP archive that will never, ever change. Think about the
performance difference between creating 560,000 files for backup/restore
versus copying a single 2.3GB file.

  - fuse zip stores all changes in memory until unmounted
  - fuse zip (and libzip for that matter) creates new temporary file when
updating archive, which takes considerable time when the archive is
very big.

This isn't much of a hastle if you have maildir per time period and
archive off. Maybe if you sync flags it may be...

 Of course this solution would have some disadvantages too, but for me
 the advantages would win. At the moment I'm not sure if I want to
 continue working on that. Maybe if there would be more interested guys

I'm *really* tempted to investigate making this work for archived
mail. Of course, the list of mounted file systems could get insane
depending on granularity I guess...

-- 
Stewart Smith


pgpZcxW0PhtqJ.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[RFC PATCH 00/13] Modular message store code

2012-02-16 Thread Stewart Smith
On Wed, 15 Feb 2012 17:01:53 -0500, Ethan Glasser-Camp  
wrote:
> I'm submitting as RFC this patch series, which introduces the idea of
> a "mailstore", a "class" that defines how to access mail, instead of
> currently assuming it's always some Maildir-ish hierarchy that
> contains a bunch of mail.

This is really awesome.

Quite a while ago now I did some experiments on storing my entire
Maildir inside git packs instead of in maildir. This produced an
*amazing* saving in disk space used. My idea is to end up with Maildir
for "current" (as everything delivers into Maildir without a problem)
and then on a (say) monthly basis, packing all mail into an archive file
and have notmuch be able to still read it.

you know what... this patch set has re-ignited my interest in making
that work.

-- 
Stewart Smith


Re: [RFC PATCH 00/13] Modular message store code

2012-02-15 Thread Stewart Smith
On Wed, 15 Feb 2012 17:01:53 -0500, Ethan Glasser-Camp gla...@cs.rpi.edu 
wrote:
 I'm submitting as RFC this patch series, which introduces the idea of
 a mailstore, a class that defines how to access mail, instead of
 currently assuming it's always some Maildir-ish hierarchy that
 contains a bunch of mail.

This is really awesome.

Quite a while ago now I did some experiments on storing my entire
Maildir inside git packs instead of in maildir. This produced an
*amazing* saving in disk space used. My idea is to end up with Maildir
for current (as everything delivers into Maildir without a problem)
and then on a (say) monthly basis, packing all mail into an archive file
and have notmuch be able to still read it.

you know what... this patch set has re-ignited my interest in making
that work.

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Preventing the user shooting themself in the foot

2011-07-01 Thread Stewart Smith
On Wed, 29 Jun 2011 22:40:07 -0700, Carl Worth  wrote:
> This means that messages can lose the "unread" tag while still remaining
> tagged "inbox", (you read a message, but don't archive it), and that
> messages can lose the "archive" tag while still remaining tagged
> "unread", (you archive a thread before reading all messages in the
> thread).
> 
> The distinction ends up being useful to me. If at some point someone
> points me to a specific message, and when I search for it I see the
> "unread" tag, then this highlights to me that I never even looked at the
> message.

IMHO this is one of the awesome things about notmuch (and I've actively
used it to go back on conversations I previously ignored)

-- 
Stewart Smith


Re: Preventing the user shooting themself in the foot

2011-07-01 Thread Stewart Smith
On Wed, 29 Jun 2011 22:40:07 -0700, Carl Worth cwo...@cworth.org wrote:
 This means that messages can lose the unread tag while still remaining
 tagged inbox, (you read a message, but don't archive it), and that
 messages can lose the archive tag while still remaining tagged
 unread, (you archive a thread before reading all messages in the
 thread).
 
 The distinction ends up being useful to me. If at some point someone
 points me to a specific message, and when I search for it I see the
 unread tag, then this highlights to me that I never even looked at the
 message.

IMHO this is one of the awesome things about notmuch (and I've actively
used it to go back on conversations I previously ignored)

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[BUG] [PATCH] Fix appending of Received headers

2011-06-11 Thread Stewart Smith
On Fri, 10 Jun 2011 17:22:50 -0700, Carl Worth  wrote:
Non-text part: multipart/signed
> On Tue, 24 May 2011 13:33:25 -0700, Carl Worth  wrote:
> > On Tue, 17 May 2011 12:10:32 +1000, Stewart Smith  > flamingspork.com> wrote:
> > > We're not properly concatenating the Received headers if we parse them
> > > while requesting a header that isn't Received.
> ...
> > I'd prefer to fix the test suite here so that we don't later regress on
> > this behavior.
> 
> I've done that now. What the test suite was missing was having messages
> that actually had more than one Received header, (otherwise, no
> concatenation was ever used in the testing).
> 
> The new test and the patch are both now pushed.

Great and thanks! Sorry I didn't manage to get updating test suite to
the top of my TODO list.

-- 
Stewart Smith


Re: Multiple sender identities (composing)

2011-05-30 Thread Stewart Smith
On Tue, 24 May 2011 14:54:37 -0700, Carl Worth cwo...@cworth.org wrote:
 I've wanted something like this, but I'm extremely reluctant to put
 fancy things like this in my .emacs file. The problem I have is that I
 don't want to restrict nice features to the people who manage to
 configure their emacs just so.

I completely agree - and am rather glad that there's a proper solution now.

 I'll reply with a patch I just wrote attempting to implement that. By
 default, it generates the list of addresses by looking in your notmuch
 configuration file. It also provides a customizable list of addresses
 that the user can provide (notmuch-identities).

I'll try trunk with the patches as soon as I get home from travel and am
somewhat remotely close to not being a zombie.

 I don't know what trouble you had with ido on Ubuntu, but hopefully you
 can work that out.

I hope so too... it could just be how I was trying to use it or user
ignorance or something like that.

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Multiple sender identities (composing)

2011-05-29 Thread Stewart Smith
On Tue, 24 May 2011 14:54:37 -0700, Carl Worth  wrote:
> I've wanted something like this, but I'm extremely reluctant to put
> fancy things like this in my .emacs file. The problem I have is that I
> don't want to restrict nice features to the people who manage to
> configure their emacs "just so".

I completely agree - and am rather glad that there's a proper solution now.

> I'll reply with a patch I just wrote attempting to implement that. By
> default, it generates the list of addresses by looking in your notmuch
> configuration file. It also provides a customizable list of addresses
> that the user can provide (notmuch-identities).

I'll try trunk with the patches as soon as I get home from travel and am
somewhat remotely close to not being a zombie.

> I don't know what trouble you had with ido on Ubuntu, but hopefully you
> can work that out.

I hope so too... it could just be how I was trying to use it or user
ignorance or something like that.

-- 
Stewart Smith


[notmuch] Mail in git

2011-05-21 Thread Stewart Smith
On Sat, 21 May 2011 09:05:54 +0200, martin f krafft  
wrote:
> Has anyone worked on this since?

No, haven't had the cycles... and SSD helped a bit to delay urgency.

-- 
Stewart Smith


Re: [notmuch] Mail in git

2011-05-21 Thread Stewart Smith
On Sat, 21 May 2011 09:05:54 +0200, martin f krafft madd...@madduck.net wrote:
 Has anyone worked on this since?

No, haven't had the cycles... and SSD helped a bit to delay urgency.

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[BUG] [PATCH] Fix appending of Received headers

2011-05-17 Thread Stewart Smith
We're not properly concatenating the Received headers if we parse them
while requesting a header that isn't Received.

this fixes notmuch-reply address detection in a bunch of situations.

diff --git a/lib/message-file.c b/lib/message-file.c
index 7722832..dd0f698 100644
--- a/lib/message-file.c
+++ b/lib/message-file.c
@@ -329,7 +329,7 @@ notmuch_message_file_get_header (notmuch_message_file_t 
*message,
/* we treat the Received: header special - we want to concat ALL of 
 * the Received: headers we encounter.
 * for everything else we return the first instance of a header */
-   if (is_received) {
+   if (strcasecmp(header, "received") == 0) {
if (header_sofar == NULL) {
/* first Received: header we encountered; just add it */
g_hash_table_insert (message->headers, header, decoded_value);

-- 
Stewart Smith


Multiple sender identities (composing)

2011-05-17 Thread Stewart Smith
On Mon, 16 May 2011 11:52:43 +0200, Thomas Jost  
wrote:
> On Mon, 16 May 2011 19:29:07 +1000, Stewart Smith  flamingspork.com> wrote:
> (people who don't use or like ido may want to replace
> ido-completing-read with completing-read)

I couldn't get ido to work at all (Ubuntu Natty). It would just prompt
and not tab complete or even accept enter (it would insert a newline in
minibuffer) - which is why I just ended up using completing-read.

> - function to change the SMTP server that will be used for sending the
>   mail according to the From header

I actually just do this via postfix sender_dependent_relayhost_maps
which ends up working quite nicely.


-- 
Stewart Smith


Multiple sender identities (composing)

2011-05-16 Thread Stewart Smith
Thought I'd share this bit of my .emacs snippet that may be useful to go
on the emacs tips page.

This does the following:
- sets up a list of possible identities to have mail From
- on composing mail, it prompts you for who you want to send mail from
- pressing enter will give you the default (first in the list)
- otherwise you have tab completion

You may also want to set this:
 '(message-sendmail-envelope-from (quote header))
(in custom-set-variables) so that if you're doing postfix sender based routing
or the like, it gets the correct address and doesn't end up sending
things the wrong way.

(setq stewart/mua-identities (list "Stewart Smith " "Stewart Smith "))

(defun stewart/notmuch-mua-mail ( from)
  (interactive)
  (setq from (completing-read "Sender identity: " stewart/mua-identities
   nil t nil nil (car stewart/mua-identities)))
  (notmuch-mua-mail nil nil (list (cons 'from from

(define-key notmuch-show-mode-map "m"
  (lambda ()
"send email"
(interactive)
(stewart/notmuch-mua-mail)))

(define-key notmuch-search-mode-map "m"
  (lambda ()
"send email"
    (interactive)
(stewart/notmuch-mua-mail)))

-- 
Stewart Smith


Multiple sender identities (composing)

2011-05-16 Thread Stewart Smith
Thought I'd share this bit of my .emacs snippet that may be useful to go
on the emacs tips page.

This does the following:
- sets up a list of possible identities to have mail From
- on composing mail, it prompts you for who you want to send mail from
- pressing enter will give you the default (first in the list)
- otherwise you have tab completion

You may also want to set this:
 '(message-sendmail-envelope-from (quote header))
(in custom-set-variables) so that if you're doing postfix sender based routing
or the like, it gets the correct address and doesn't end up sending
things the wrong way.

(setq stewart/mua-identities (list Stewart Smith stew...@flamingspork.com 
Stewart Smith stewart.sm...@percona.com))

(defun stewart/notmuch-mua-mail (optional from)
  (interactive)
  (setq from (completing-read Sender identity:  stewart/mua-identities
   nil t nil nil (car stewart/mua-identities)))
  (notmuch-mua-mail nil nil (list (cons 'from from

(define-key notmuch-show-mode-map m
  (lambda ()
send email
(interactive)
(stewart/notmuch-mua-mail)))

(define-key notmuch-search-mode-map m
  (lambda ()
send email
(interactive)
(stewart/notmuch-mua-mail)))

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Multiple sender identities (composing)

2011-05-16 Thread Stewart Smith
On Mon, 16 May 2011 11:52:43 +0200, Thomas Jost schno...@schnouki.net wrote:
 On Mon, 16 May 2011 19:29:07 +1000, Stewart Smith stew...@flamingspork.com 
 wrote:
 (people who don't use or like ido may want to replace
 ido-completing-read with completing-read)

I couldn't get ido to work at all (Ubuntu Natty). It would just prompt
and not tab complete or even accept enter (it would insert a newline in
minibuffer) - which is why I just ended up using completing-read.

 - function to change the SMTP server that will be used for sending the
   mail according to the From header

I actually just do this via postfix sender_dependent_relayhost_maps
which ends up working quite nicely.


-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


storing From and Subject in xapian

2011-05-11 Thread Stewart Smith
On Sun, 08 May 2011 22:24:37 -0700, Istvan Marko  wrote:
> Jameson Graef Rollins  writes:
> 
> > Unless I hear a strong positive response I'll hold off on considering it
> > for 0.6, and suggest instead targeting it for 0.7.
> 
> I would say wait until 0.7 at least.
> 
> An important thing missing is fallback to the old method for messages
> where the Subject/From VALUE fields don't exist. Otherwise people will
> get blank results until they rebuild their database.

Would it be possible to progressively fill the DB with the new data?

i.e.

if Subject/From not in db for message
   add Subject/From for this message to DB.

?

That'd be awesome from my pov (having just rebuilt my database in chert
format and that took FOREVER).

-- 
Stewart Smith


Re: storing From and Subject in xapian

2011-05-10 Thread Stewart Smith
On Sun, 08 May 2011 22:24:37 -0700, Istvan Marko notm...@kismala.com wrote:
 Jameson Graef Rollins jroll...@finestructure.net writes:
 
  Unless I hear a strong positive response I'll hold off on considering it
  for 0.6, and suggest instead targeting it for 0.7.
 
 I would say wait until 0.7 at least.
 
 An important thing missing is fallback to the old method for messages
 where the Subject/From VALUE fields don't exist. Otherwise people will
 get blank results until they rebuild their database.

Would it be possible to progressively fill the DB with the new data?

i.e.

if Subject/From not in db for message
   add Subject/From for this message to DB.

?

That'd be awesome from my pov (having just rebuilt my database in chert
format and that took FOREVER).

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


notmuch's idea of concurrency / failing an invocation

2011-02-02 Thread Stewart Smith
On Sat, 29 Jan 2011 19:14:27 -0500, Daniel Kahn Gillmor  wrote:
> On 01/28/2011 08:05 PM, Stewart Smith wrote:
> > I'm about at the point where I'm going to take my git mail store
> > experiments and get them really to work (and everyone will have to use
> > 'notmuch cat' or the like to access the messages)
> 
> Would this hypothetical git-based mail store retain the atomicity and
> lockless concurrent-access of a maildir?  That is, could it be used in a
> server environment?

My idea is that it would be... at least with the experiments conducted
so far.

> > which should provide
> > both great storage efficiency, much faster backups of your Maildir as
> > well as having way fewer paths to traverse checking for new mail.
> 
> when you say "backups of your Maildir" do you mean "backups of your
> git-based mail store" ?  or is this somehow a literal Maildir stored in git?

I'll write more "soon" when there is more code behind it... and I figure
out a good upgrade path to something that is also self-consistently sane.

-- 
Stewart Smith


Re: notmuch's idea of concurrency / failing an invocation

2011-02-01 Thread Stewart Smith
On Sat, 29 Jan 2011 19:14:27 -0500, Daniel Kahn Gillmor 
d...@fifthhorseman.net wrote:
 On 01/28/2011 08:05 PM, Stewart Smith wrote:
  I'm about at the point where I'm going to take my git mail store
  experiments and get them really to work (and everyone will have to use
  'notmuch cat' or the like to access the messages)
 
 Would this hypothetical git-based mail store retain the atomicity and
 lockless concurrent-access of a maildir?  That is, could it be used in a
 server environment?

My idea is that it would be... at least with the experiments conducted
so far.

  which should provide
  both great storage efficiency, much faster backups of your Maildir as
  well as having way fewer paths to traverse checking for new mail.
 
 when you say backups of your Maildir do you mean backups of your
 git-based mail store ?  or is this somehow a literal Maildir stored in git?

I'll write more soon when there is more code behind it... and I figure
out a good upgrade path to something that is also self-consistently sane.

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


notmuch's idea of concurrency / failing an invocation

2011-01-29 Thread Stewart Smith
On Thu, 27 Jan 2011 13:40:25 -0500, micah anderson  wrote:
> Due to my harddisk in my laptop being slow (5400RPM), my notmuch
> database growing, and perhaps some fragmentation somewhere, this has
> become *incredibly* annoying for me. I am checking email every 30
> minutes, and I'm nicing and ionicing the processes so I can use my
> machine, but while those processes are running, I'm effectively locked
> out of a good portion of my email. 

I used to use spinning rust and also noticed things were slow. This
is in fact mostly not xapian - but rather crawling the Maildir. I
improved this early on in notmuch history by reducing the number of
seeks needed when traversing the Maildir hierarchy (e.g. stat in
i-node order, which is roughly on-disk order).

I'm about at the point where I'm going to take my git mail store
experiments and get them really to work (and everyone will have to use
'notmuch cat' or the like to access the messages) which should provide
both great storage efficiency, much faster backups of your Maildir as
well as having way fewer paths to traverse checking for new mail.

-- 
Stewart Smith


Re: notmuch's idea of concurrency / failing an invocation

2011-01-29 Thread Stewart Smith
On Thu, 27 Jan 2011 13:40:25 -0500, micah anderson mi...@riseup.net wrote:
 Due to my harddisk in my laptop being slow (5400RPM), my notmuch
 database growing, and perhaps some fragmentation somewhere, this has
 become *incredibly* annoying for me. I am checking email every 30
 minutes, and I'm nicing and ionicing the processes so I can use my
 machine, but while those processes are running, I'm effectively locked
 out of a good portion of my email. 

I used to use spinning rust and also noticed things were slow. This
is in fact mostly not xapian - but rather crawling the Maildir. I
improved this early on in notmuch history by reducing the number of
seeks needed when traversing the Maildir hierarchy (e.g. stat in
i-node order, which is roughly on-disk order).

I'm about at the point where I'm going to take my git mail store
experiments and get them really to work (and everyone will have to use
'notmuch cat' or the like to access the messages) which should provide
both great storage efficiency, much faster backups of your Maildir as
well as having way fewer paths to traverse checking for new mail.

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] Fix linker error from insufficient LDFLAGS

2010-05-06 Thread Stewart Smith
On Fri, 23 Apr 2010 17:53:17 -0700, Carl Worth  wrote:
> On Thu, 22 Apr 2010 18:20:27 -0400, Ben Gamari  
> wrote:
> > It seems that LDFLAGS have recently been reorganized, along with the
> > introduction of a notmuch-shared rule. Unfortunately, the LDFLAGS used
> > in notmuch-shared don't include CONFIGURE_LDFLAGS. This caused linking
> > to fail with the following,
> 
> What system is this on?

I got this. Ubuntu 9.10 with gold as linker:
$ ld --version
GNU gold (GNU Binutils for Ubuntu 2.20) 1.9

which could be what's causing it?

anyway, this patch fixed linking for me.

-- 
Stewart Smith


Re: [PATCH] Fix linker error from insufficient LDFLAGS

2010-05-05 Thread Stewart Smith
On Fri, 23 Apr 2010 17:53:17 -0700, Carl Worth cwo...@cworth.org wrote:
 On Thu, 22 Apr 2010 18:20:27 -0400, Ben Gamari bgamari.f...@gmail.com wrote:
  It seems that LDFLAGS have recently been reorganized, along with the
  introduction of a notmuch-shared rule. Unfortunately, the LDFLAGS used
  in notmuch-shared don't include CONFIGURE_LDFLAGS. This caused linking
  to fail with the following,
 
 What system is this on?

I got this. Ubuntu 9.10 with gold as linker:
$ ld --version
GNU gold (GNU Binutils for Ubuntu 2.20) 1.9

which could be what's causing it?

anyway, this patch fixed linking for me.

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 1/4] Mailstore abstraction interface

2010-04-13 Thread Stewart Smith
On Tue, 13 Apr 2010 10:53:12 -0700, Carl Worth  wrote:
> This series is looking like one of the most complete approaches to
> maildir-flag synchronization, (and I like some of the motivation that
> leads to "notmuch cat"). But I think the mailstore abstraction is
> largely a distraction from the real features here.

For my case (of wanting to have backup of my mailstore complete in
reasonable time, preferably using less disk space) of wanting mail in
git packs, 'notmuch cat' being used everywhere removes a lot of the
issues of doing this.

(pluggin in an alternative to readdir is fairly simple... but the emacs
UI needs to read from it too :)

-- 
Stewart Smith


Re: [PATCH 1/4] Mailstore abstraction interface

2010-04-13 Thread Stewart Smith
On Tue, 13 Apr 2010 10:53:12 -0700, Carl Worth cwo...@cworth.org wrote:
 This series is looking like one of the most complete approaches to
 maildir-flag synchronization, (and I like some of the motivation that
 leads to notmuch cat). But I think the mailstore abstraction is
 largely a distraction from the real features here.

For my case (of wanting to have backup of my mailstore complete in
reasonable time, preferably using less disk space) of wanting mail in
git packs, 'notmuch cat' being used everywhere removes a lot of the
issues of doing this.

(pluggin in an alternative to readdir is fairly simple... but the emacs
UI needs to read from it too :)

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


please eat my data!

2010-04-12 Thread Stewart Smith
On Mon, 12 Apr 2010 17:24:35 +0200, "Sebastian Spaeth"  wrote:
> What I find intersting is that we have a 2x speedup and a 10x speedup
> for different queries. Olly was saying on IRC that both *should* really be
> behaving in much the same manner.

Remember that on ext3 (and pretty sure ext4) fsync is the same as
sync(). So performance depends on how much dirty data you have in your cache.

libeatmydata also gets rid of msync(), O_SYNC etc as well.

-- 
Stewart Smith


Re: please eat my data!

2010-04-12 Thread Stewart Smith
On Mon, 12 Apr 2010 17:24:35 +0200, Sebastian Spaeth sebast...@sspaeth.de 
wrote:
 What I find intersting is that we have a 2x speedup and a 10x speedup
 for different queries. Olly was saying on IRC that both *should* really be
 behaving in much the same manner.

Remember that on ext3 (and pretty sure ext4) fsync is the same as
sync(). So performance depends on how much dirty data you have in your cache.

libeatmydata also gets rid of msync(), O_SYNC etc as well.

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] Mailstore abstraction & maildir synchronization

2010-03-24 Thread Stewart Smith
On Thu, 18 Mar 2010 16:39:36 +0100, Michal Sojka  wrote:
> - Only file-based storage is suported. Notmuch access the files
>   directly, and not via the mailstore interface.

It'll be great when this is fixed... should be trivial to add a git
backend then.

(i have in no way been looking at tags in git though... doesn't really
interest me and git aint a database)

> - (maildir) Viewing/storing of attachments of unread messages doesn't
>   work. The reason is that when you view the message it its unread tag
>   is removed which leads to rename of the file, but Emacs still uses
>   the original name to access the attachment.

What about migrating from a maildir that's turned into notmuch back to
this maildir backend? What will be authoritive: maildir or notmuch database?
-- 
Stewart Smith


Re: [notmuch] Mailstore abstraction maildir synchronization

2010-03-23 Thread Stewart Smith
On Thu, 18 Mar 2010 16:39:36 +0100, Michal Sojka sojk...@fel.cvut.cz wrote:
 - Only file-based storage is suported. Notmuch access the files
   directly, and not via the mailstore interface.

It'll be great when this is fixed... should be trivial to add a git
backend then.

(i have in no way been looking at tags in git though... doesn't really
interest me and git aint a database)

 - (maildir) Viewing/storing of attachments of unread messages doesn't
   work. The reason is that when you view the message it its unread tag
   is removed which leads to rename of the file, but Emacs still uses
   the original name to access the attachment.

What about migrating from a maildir that's turned into notmuch back to
this maildir backend? What will be authoritive: maildir or notmuch database?
-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH] A simple approach to maildir flags

2010-03-01 Thread Stewart Smith
On Fri, 26 Feb 2010 14:49:25 -0500, Mike Kelly  wrote:
> The following patches attempt to provide a simple, extendable approach
> to handling the 'Seen' maildir flag. To appease (hopefully) everyone, it
> will only do this for new messages. This means that people coming from
> another MUA won't be stuck with 30,000 unread messages, for example.
> 
> It should be simple to extend this to other maildir flags, too, if
> people want them and can decide on what tags they should correspond to.

Personally, I like the seen messages not to be in inbox (by default) as
either:
1) I'm importing an old Maildir, in which case if it's read it's
probably been dealt with
2) i've used another mail client, same as above.

-- 
Stewart Smith


Re: [notmuch] [PATCH] A simple approach to maildir flags

2010-02-28 Thread Stewart Smith
On Fri, 26 Feb 2010 14:49:25 -0500, Mike Kelly pi...@pioto.org wrote:
 The following patches attempt to provide a simple, extendable approach
 to handling the 'Seen' maildir flag. To appease (hopefully) everyone, it
 will only do this for new messages. This means that people coming from
 another MUA won't be stuck with 30,000 unread messages, for example.
 
 It should be simple to extend this to other maildir flags, too, if
 people want them and can decide on what tags they should correspond to.

Personally, I like the seen messages not to be in inbox (by default) as
either:
1) I'm importing an old Maildir, in which case if it's read it's
probably been dealt with
2) i've used another mail client, same as above.

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH] Added mail directory filename pattern support.

2010-02-23 Thread Stewart Smith
On Mon, Feb 22, 2010 at 12:07:31PM -0800, Bart Massey wrote:
> Typically, the filenames in a mail directory that actually
> contain mail obey some specific format.  For example, in my
> MH email directory, all mail filenames consist only of
> digits.
> 
> This patch adds support for a config file variable
> "filename_pattern" which maybe set to a regex used to filter
> only valid mail filenames when scanning.  Effective use of
> filename_pattern cuts down on the noise from notmuch, and
> may speed it up in some cases.

What about the other way around?

e.g. if anybody has ever pointed Evolution at a Maildir, you get a
bunch of Maildir-name.ev-summary and .ev-summary-meta and .ibex.index
and whatever.

A default list of ignored patterns would be pretty easy to come up with. 

-- 
Stewart Smith


Re: [notmuch] [PATCH] Added mail directory filename pattern support.

2010-02-22 Thread Stewart Smith
On Mon, Feb 22, 2010 at 12:07:31PM -0800, Bart Massey wrote:
 Typically, the filenames in a mail directory that actually
 contain mail obey some specific format.  For example, in my
 MH email directory, all mail filenames consist only of
 digits.
 
 This patch adds support for a config file variable
 filename_pattern which maybe set to a regex used to filter
 only valid mail filenames when scanning.  Effective use of
 filename_pattern cuts down on the noise from notmuch, and
 may speed it up in some cases.

What about the other way around?

e.g. if anybody has ever pointed Evolution at a Maildir, you get a
bunch of Maildir-name.ev-summary and .ev-summary-meta and .ibex.index
and whatever.

A default list of ignored patterns would be pretty easy to come up with. 

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] Mail in git

2010-02-18 Thread Stewart Smith
On Wed, 17 Feb 2010 14:21:01 +1300, martin f krafft  
wrote:
> What I am wondering is if (explicit) tags couldn't be represented as
> tree-objects with this.
> 
>   evenless-link   ? link a message object with a tree object
>   evenless?unlink ? unlink a message object from tree object
> [replaces evenless-unlink]

I think it could get expensive for tags with lots of messages.

With my fast-import script, doing the commit (that
referenced... umm.. 800,000+ objects took a *very* long time).

As far as I understand it, the tree object is stored in full and space
is only reclaimed during repack (due to delta compression).

So if you, say, had the entire history of a high volume list such as
linux-kernel, adding messages could get rather expensive if you
auto-tagged (or autotagged messages with patches or whatever).

> messages would then be deleted whenever using git-gc.
> 
> No idea how this would sync if we don't keep ancestry. Otoh, it
> would probably not be very expensive to do just that.

If we keep ancestry though, we are reusing existing working code for
backup (git-pull :)

Keep in mind that with my tests, the Maildir in git is about a quarter
to a fifth of the size of it in Maildir... so a bit of extra usage per
message isn't as dramatic as it may sound.

> Is it possible to find out all trees that reference a given object
> with Git in constant or sub-linear time?

I don't think so but I'm not sure.

-- 
Stewart Smith


[notmuch] Mail in git

2010-02-17 Thread Stewart Smith
On Wed, 17 Feb 2010 11:21:51 +1100, Stewart Smith  
wrote:
> Using fast-import is interesting. Does it update the working tree? The
> big thing I wanted to avoid was creating a working tree (another million
> inodes being created is not ever what I need)
> 
> Also interesting is the mention of creating packs on the fly... this
> could save the time in first writing the object and then packing it (as
> my script does).
> 
> I'm going to play with this

and I did.

good news... on my mailstore (which, as I've previously mentioned, takes
about 10 minutes to run 'du' over, about the same time as 'notmuch new'
takes):

using the (attached) evenless.pl to create a single commit with
everything in it:

$ du -sh .git
3.4G.git

Down from a whopping 14-15GB!!!

My previous effort (git-write-object, create pack every 1000 messages,
rinse, repeat) took all night and got to 3.7GB.

This took only 108 minutes.

In both cases, i was creating the repository on another spindle (USB2.0
disk attached to my laptop).

git-ls-tree and git-cat-file both work for listing and getting objects.

The next thing to think about is adding objects as they come
in... creating a new commit with just an added file should be pretty
simple and easy... but this means we get to keep a "revision history" of
the mailstore, which is *possibly* not ideal in terms of storage
efficiency (i'll do a trial with mine of doing one message at a time and
seeing what the end size is).

however... commit per added mail (or mails) does give us the advantage
of a really well documented and tested backup system :)

Deleting could be hard.. if we actually want the objects to go away in a
"permanent" way (not just no longer be referenced).

for the stats nerds:

$ time perl /home/stewart/evenless/evenless.pl /home/stewart/Maildir/INBOX

git-fast-import statistics:
-
Alloc'd objects: 785000
Total objects:   781813 ( 79023 duplicates  )
  blobs  :   781363 ( 79023 duplicates 708627 deltas)
  trees  :  449 ( 0 duplicates  0 deltas)
  commits:1 ( 0 duplicates  0 deltas)
  tags   :0 ( 0 duplicates  0 deltas)
Total branches:   1 ( 1 loads )
  marks:1048576 (860386 unique)
  atoms: 860557
Memory total:182780 KiB
   pools:152116 KiB
 objects: 30664 KiB
-
pack_report: getpagesize()=   4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit  = 8589934592
pack_report: pack_used_ctr=  1
pack_report: pack_mmap_calls  =  1
pack_report: pack_open_windows=  1 /  1
pack_report: pack_mapped  =  388496447 /  388496447
-


real107m43.130s
user45m25.430s
sys 2m49.440s


-- next part --
A non-text attachment was scrubbed...
Name: evenless.pl
Type: text/x-perl
Size: 1413 bytes
Desc: evenless.pl: maildir to git using fast-import
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100217/bc1a3f34/attachment.pl>
------ next part --




-- 
Stewart Smith


[notmuch] Mail in git

2010-02-17 Thread Stewart Smith
On Tue, 16 Feb 2010 14:06:29 -0500, Ben Gamari  wrote:
> Excerpts from Stewart Smith's message of Sun Feb 14 19:29:14 -0500 2010:
> > So... I sketched this out in my head at LCA... and it's taken a bit of
> > time to actually properly try it.
> > 
> In case anyone wanted to play around with this, I've written up my own
> little implementation[1] of a git mail import script. It's quite simple,
> but I felt it might be nice to have some public code to play around
> with. I get around 80 messages/second on my laptop and things are
> definitely quite IO bound. You get 1 commit per message, although I'm
> not entirely sure if this is the correct way to do things.
> 
> [1] git://goldnerlab.physics.umass.edu/git-mail

Using fast-import is interesting. Does it update the working tree? The
big thing I wanted to avoid was creating a working tree (another million
inodes being created is not ever what I need)

Also interesting is the mention of creating packs on the fly... this
could save the time in first writing the object and then packing it (as
my script does).

I'm going to play with this
-- 
Stewart Smith


Re: [notmuch] Mail in git

2010-02-17 Thread Stewart Smith
On Wed, 17 Feb 2010 11:21:51 +1100, Stewart Smith stew...@flamingspork.com 
wrote:
 Using fast-import is interesting. Does it update the working tree? The
 big thing I wanted to avoid was creating a working tree (another million
 inodes being created is not ever what I need)
 
 Also interesting is the mention of creating packs on the fly... this
 could save the time in first writing the object and then packing it (as
 my script does).
 
 I'm going to play with this

and I did.

good news... on my mailstore (which, as I've previously mentioned, takes
about 10 minutes to run 'du' over, about the same time as 'notmuch new'
takes):

using the (attached) evenless.pl to create a single commit with
everything in it:

$ du -sh .git
3.4G.git

Down from a whopping 14-15GB!!!

My previous effort (git-write-object, create pack every 1000 messages,
rinse, repeat) took all night and got to 3.7GB.

This took only 108 minutes.

In both cases, i was creating the repository on another spindle (USB2.0
disk attached to my laptop).

git-ls-tree and git-cat-file both work for listing and getting objects.

The next thing to think about is adding objects as they come
in... creating a new commit with just an added file should be pretty
simple and easy... but this means we get to keep a revision history of
the mailstore, which is *possibly* not ideal in terms of storage
efficiency (i'll do a trial with mine of doing one message at a time and
seeing what the end size is).

however... commit per added mail (or mails) does give us the advantage
of a really well documented and tested backup system :)

Deleting could be hard.. if we actually want the objects to go away in a
permanent way (not just no longer be referenced).

for the stats nerds:

$ time perl /home/stewart/evenless/evenless.pl /home/stewart/Maildir/INBOX

git-fast-import statistics:
-
Alloc'd objects: 785000
Total objects:   781813 ( 79023 duplicates  )
  blobs  :   781363 ( 79023 duplicates 708627 deltas)
  trees  :  449 ( 0 duplicates  0 deltas)
  commits:1 ( 0 duplicates  0 deltas)
  tags   :0 ( 0 duplicates  0 deltas)
Total branches:   1 ( 1 loads )
  marks:1048576 (860386 unique)
  atoms: 860557
Memory total:182780 KiB
   pools:152116 KiB
 objects: 30664 KiB
-
pack_report: getpagesize()=   4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit  = 8589934592
pack_report: pack_used_ctr=  1
pack_report: pack_mmap_calls  =  1
pack_report: pack_open_windows=  1 /  1
pack_report: pack_mapped  =  388496447 /  388496447
-


real107m43.130s
user45m25.430s
sys 2m49.440s


#!/usr/bin/perl -w

use strict;

my $tree= ;

use IPC::Open2;

use File::stat;

my $FILES;

my $mark= 1;

my $stripdir= $ARGV[0];

sub fastimport_blobs ($);
sub fastimport_blobs ($)
{
my $dirname= shift @_;

opendir (my $dirhandle, $dirname);
foreach (readdir $dirhandle)
{
	next if /^\.\.?$/;
	next if /\.cmeta$/;
	next if /\.ibex.index$/;
	next if /\.ibex.index.data$/;
	next if /\.ev-summary$/;
	next if /\.ev-summary-meta$/;
	next if /\.notmuch$/;

	if (-d $dirname.'/'.$_)
	{
	print STDERR Recursing into $_/ ;
	fastimport_blobs($dirname.'/'.$_);
	print STDERR \n;
	}
	else
	{
	my $sb= stat($dirname/$_);
	print FASTIMPORT blob\n;
	print FASTIMPORT mark :$mark\n;
	print FASTIMPORT data .($sb-size).\n;
	open FILEIN, $dirname/$_;
	my $content;
	sysread FILEIN, $content, $sb-size;
	close FILEIN;
	print FASTIMPORT $content;
	my $storedir= $dirname/$_;
	$storedir=~ s/^$stripdir//;
	$storedir=~ s/^\///;
	$FILES.=M 0644 :$mark $storedir\n;
	$mark++;
	}
}
}

open FASTIMPORT, | git fast-import --date-format=rfc2822;

fastimport_blobs($ARGV[0]);

print FASTIMPORT commit refs/heads/master\n;
print FASTIMPORT committer EvenLess evenle...@evenless .`date -R`;
print FASTIMPORT data 11\n;
print FASTIMPORT mail commit\n;
print FASTIMPORT $FILES;
print FASTIMPORT \n;

close FASTIMPORT;




-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] Mail in git

2010-02-17 Thread Stewart Smith
On Wed, 17 Feb 2010 14:21:01 +1300, martin f krafft madd...@madduck.net wrote:
 What I am wondering is if (explicit) tags couldn't be represented as
 tree-objects with this.
 
   evenless-link   — link a message object with a tree object
   evenless–unlink – unlink a message object from tree object
 [replaces evenless-unlink]

I think it could get expensive for tags with lots of messages.

With my fast-import script, doing the commit (that
referenced... umm.. 800,000+ objects took a *very* long time).

As far as I understand it, the tree object is stored in full and space
is only reclaimed during repack (due to delta compression).

So if you, say, had the entire history of a high volume list such as
linux-kernel, adding messages could get rather expensive if you
auto-tagged (or autotagged messages with patches or whatever).

 messages would then be deleted whenever using git-gc.
 
 No idea how this would sync if we don't keep ancestry. Otoh, it
 would probably not be very expensive to do just that.

If we keep ancestry though, we are reusing existing working code for
backup (git-pull :)

Keep in mind that with my tests, the Maildir in git is about a quarter
to a fifth of the size of it in Maildir... so a bit of extra usage per
message isn't as dramatic as it may sound.

 Is it possible to find out all trees that reference a given object
 with Git in constant or sub-linear time?

I don't think so but I'm not sure.

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] Notmuch performance problems on OSX

2010-02-16 Thread Stewart Smith
On Fri, 15 Jan 2010 03:58:50 + (UTC), Olly Betts  wrote:
> One difference between OS X and other systems is that OS X supports the
> F_FULLSYNC ioctl, and other systems don't (currently, at least AFAIK)
> and Xapian uses that if it is available to ensure that changes have
> actually made it to disk:
> 
> http://trac.xapian.org/ticket/288
> 
> On other systems, it uses fdatasync() or fsync(), which typically just
> ensure that the data has left the OS - it can sit in disk controller or
> drive caches for potentially seconds longer.  This call happens once
> per table for every (explicit or implicit) flush on a database.

At least if you OS and file system don't hate you (e.g. XFS on Linux),
then fsync() really does flush the drive cache.

Also keep in mind that the OSX file system (HFS+) was great for
1985. It's essentially single threaded :/

-- 
Stewart Smith


[notmuch] [PATCH] notmuch: Respect maildir message flags

2010-02-16 Thread Stewart Smith
On Tue, Feb 16, 2010 at 03:12:50PM +1300, martin f krafft wrote:
> also sprach Stewart Smith  [2010.02.16.1458 
> +1300]:
> > +   case 'R': /* replied */
> > +   notmuch_message_add_tag (message, "answered");
> > +   break;
> 
> 'r' means replied, not 'answered'.

fixed.

(i have to admit... i didn't look too closely at this... it just
worked enough for me)

> 
> > +   case 'T': /* trashed */
> > +   notmuch_message_add_tag (message, "deleted");
> > +   break;
> 
> Same. trashed and deleted are not the same thing.

changed to 'trashed'.

> I don't want to get into an argument over this, because I think this
> already exposes a problem: you are putting into global namespace
> something not everyone might want, or agree with.
> 
> Why not use 'maildirflags::replied' instead? People can always map
> that to something in the global namespace.

What about putting them all in there except for the seen tag, with the
seen tag dictating if it gets marked 'unread' or not? I cannot imagine
where somebody would want this not to be the case... it was bad enough
discovering 100,000 unread messages :)

What about this patch (just with those few things fixed)?

diff --git a/notmuch-new.c b/notmuch-new.c
index f25c71f..8303047 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -39,6 +39,7 @@ typedef struct {
 int total_files;
 int processed_files;
 int added_messages;
+int tag_maildir;
 struct timeval tv_start;

 _filename_list_t *removed_files;
@@ -169,6 +170,60 @@ _entries_resemble_maildir (struct dirent **entries, int 
count)
 return 0;
 }

+/* Tag new mail according to its Maildir attribute flags.
+ *
+ * Test if the mail file's filename contains any of the
+ * standard Maildir attributes, and translate these to
+ * the corresponding standard notmuch tags.
+ *
+ * If the message is not marked as 'seen', or if no
+ * flags are present, tag as 'inbox, unread'.
+ */
+static void
+derive_tags_from_maildir_flags (notmuch_message_t *message,
+   const char * path)
+{
+int seen = FALSE;
+int end_of_flags = FALSE;
+size_t l = strlen(path);
+
+/* Non-experimental message flags start with this */
+char * i = strstr(path, ":2,");
+i = (i) ? i : strstr(path, "!2,"); /* This format is used on VFAT */
+if (i != NULL) {
+   i += 3;
+   for (; i < (path + l) && !end_of_flags; i++) {
+   switch (*i) {
+   case 'F' :
+   notmuch_message_add_tag (message, "maildir::flagged");
+   break;
+   case 'R': /* replied */
+   notmuch_message_add_tag (message, "maildir::replied");
+   break;
+   case 'D':
+   notmuch_message_add_tag (message, "maildir::draft");
+   break;
+   case 'S': /* seen */
+   seen = TRUE;
+   break;
+   case 'T': /* trashed */
+   notmuch_message_add_tag (message, "maildir::trashed");
+   break;
+   case 'P': /* passed */
+   notmuch_message_add_tag (message, "maildir::forwarded");
+   break;
+   default:
+   end_of_flags = TRUE;
+   break;
+   }
+   }
+}
+
+if (i == NULL || !seen) {
+   tag_inbox_and_unread (message);
+}
+}
+
 /* Examine 'path' recursively as follows:
  *
  *   o Ask the filesystem for the mtime of 'path' (fs_mtime)
@@ -222,6 +277,7 @@ add_files_recursive (notmuch_database_t *notmuch,
 notmuch_filenames_t *db_subdirs = NULL;
 struct stat st;
 notmuch_bool_t is_maildir, new_directory;
+int maildir_detected = -1;

 if (stat (path, )) {
fprintf (stderr, "Error reading directory %s: %s\n",
@@ -301,6 +357,28 @@ add_files_recursive (notmuch_database_t *notmuch,
continue;
}

+   /* If this directory is a Maildir folder, we need to
+* ignore any subdirectories marked tmp/, and scan for
+* Maildir attributes on messages contained in the sub-
+* directories 'new' and 'cur'. */
+   if (maildir_detected != 0 &&
+   (entry->d_type == DT_DIR || entry->d_type == DT_UNKNOWN) &&
+   ((strcmp (entry->d_name, "tmp") == 0) ||
+(strcmp (entry->d_name, "new") == 0) ||
+(strcmp (entry->d_name, "cur") == 0))) {
+
+if (maildir_detected == -1) {
+  maildir_detected = _entries_resemble_maildir(fs_entries, num_fs_entries);
+}
+if (maildir_detected == 1) {
+  if (strcmp (entry->d_name, "tmp") == 0) {
+continue;
+  } else {
+state->tag_maildir = TRUE;
+  }
+}
+  }
+
next = talloc_asprintf (notmuch, "%s/%

[notmuch] [PATCH] notmuch: Respect maildir message flags

2010-02-16 Thread Stewart Smith
New patch that does it. Pretty much same as the old one, just with
that one bug I mentioned fixed. This is what I've currently used to
import my Maildir. I'm now happy :)

diff --git a/notmuch-new.c b/notmuch-new.c
index f25c71f..43371a3 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -39,6 +39,7 @@ typedef struct {
 int total_files;
 int processed_files;
 int added_messages;
+int tag_maildir;
 struct timeval tv_start;

 _filename_list_t *removed_files;
@@ -169,6 +170,60 @@ _entries_resemble_maildir (struct dirent **entries, int 
count)
 return 0;
 }

+/* Tag new mail according to its Maildir attribute flags.
+ *
+ * Test if the mail file's filename contains any of the
+ * standard Maildir attributes, and translate these to
+ * the corresponding standard notmuch tags.
+ *
+ * If the message is not marked as 'seen', or if no
+ * flags are present, tag as 'inbox, unread'.
+ */
+static void
+derive_tags_from_maildir_flags (notmuch_message_t *message,
+   const char * path)
+{
+int seen = FALSE;
+int end_of_flags = FALSE;
+size_t l = strlen(path);
+
+/* Non-experimental message flags start with this */
+char * i = strstr(path, ":2,");
+i = (i) ? i : strstr(path, "!2,"); /* This format is used on VFAT */
+if (i != NULL) {
+   i += 3;
+   for (; i < (path + l) && !end_of_flags; i++) {
+   switch (*i) {
+   case 'F' :
+   notmuch_message_add_tag (message, "flagged");
+   break;
+   case 'R': /* replied */
+   notmuch_message_add_tag (message, "answered");
+   break;
+   case 'D':
+   notmuch_message_add_tag (message, "draft");
+   break;
+   case 'S': /* seen */
+   seen = TRUE;
+   break;
+   case 'T': /* trashed */
+   notmuch_message_add_tag (message, "deleted");
+   break;
+   case 'P': /* passed */
+   notmuch_message_add_tag (message, "forwarded");
+   break;
+   default:
+   end_of_flags = TRUE;
+   break;
+   }
+   }
+}
+
+if (i == NULL || !seen) {
+   tag_inbox_and_unread (message);
+}
+}
+
 /* Examine 'path' recursively as follows:
  *
  *   o Ask the filesystem for the mtime of 'path' (fs_mtime)
@@ -222,6 +277,7 @@ add_files_recursive (notmuch_database_t *notmuch,
 notmuch_filenames_t *db_subdirs = NULL;
 struct stat st;
 notmuch_bool_t is_maildir, new_directory;
+int maildir_detected = -1;

 if (stat (path, )) {
fprintf (stderr, "Error reading directory %s: %s\n",
@@ -301,6 +357,28 @@ add_files_recursive (notmuch_database_t *notmuch,
continue;
}

+   /* If this directory is a Maildir folder, we need to
+* ignore any subdirectories marked tmp/, and scan for
+* Maildir attributes on messages contained in the sub-
+* directories 'new' and 'cur'. */
+   if (maildir_detected != 0 &&
+   (entry->d_type == DT_DIR || entry->d_type == DT_UNKNOWN) &&
+   ((strcmp (entry->d_name, "tmp") == 0) ||
+(strcmp (entry->d_name, "new") == 0) ||
+(strcmp (entry->d_name, "cur") == 0))) {
+
+if (maildir_detected == -1) {
+  maildir_detected = _entries_resemble_maildir(fs_entries, num_fs_entries);
+}
+if (maildir_detected == 1) {
+  if (strcmp (entry->d_name, "tmp") == 0) {
+continue;
+  } else {
+state->tag_maildir = TRUE;
+  }
+}
+  }
+
next = talloc_asprintf (notmuch, "%s/%s", path, entry->d_name);
status = add_files_recursive (notmuch, next, state);
if (status && ret == NOTMUCH_STATUS_SUCCESS)
@@ -412,7 +490,12 @@ add_files_recursive (notmuch_database_t *notmuch,
/* success */
case NOTMUCH_STATUS_SUCCESS:
state->added_messages++;
-   tag_inbox_and_unread (message);
+   if (state->tag_maildir) {
+   derive_tags_from_maildir_flags (message,
+   entry->d_name);
+   } else {
+       tag_inbox_and_unread (message);
+   }
break;
/* Non-fatal issues (go on to next file) */
case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID:


-- 
Stewart Smith


[notmuch] [PATCH] notmuch: Respect maildir message flags

2010-02-15 Thread Stewart Smith
On Wed, Feb 10, 2010 at 01:43:39PM +1030, Tim Stoakes wrote:
> My apologies for dredging up an old thread. I don't want to restart the
> religious war over whether notmuch should respect Maildir flags -
> suffice to say that *I* want that, and the patch posted by Michiel
> seemed to be the best way to make that happen.

I want this too :)

I also found a bug

> @@ -301,6 +357,28 @@ add_files_recursive (notmuch_database_t *notmuch,
>   continue;
>   }
>  
> + /* If this directory is a Maildir folder, we need to
> +  * ignore any subdirectories marked tmp/, and scan for
> +  * Maildir attributes on messages contained in the sub-
> +  * directories 'new' and 'cur'. */
> + if (maildir_detected != 0 &&
> + entry->d_type == DT_DIR &&
> + ((strcmp (entry->d_name, "tmp") == 0) ||
> +  (strcmp (entry->d_name, "new") == 0) ||
> +  (strcmp (entry->d_name, "cur") == 0))) {

should be
(entry->d_type == DT_DIR || entry->d_type == DT_UNKNOWN) &&

as not everywhere is going to give you d_type (e.g. my machine).


(took me a while to find/figure that out :) 
-- 
Stewart Smith


[notmuch] Git as notmuch object store (was: Potential problem using Git for mail)

2010-02-15 Thread Stewart Smith
On Mon, Jan 25, 2010 at 01:46:59PM +1300, martin f krafft wrote:
> Stewart, you've worked most on this so far. Would you like to share
> your thoughts?

Just posted a new thread with my latest experiments. Things look
rather good from a storage size point of view. Still a few things to
work out though.

-- 
Stewart Smith


[notmuch] Mail in git

2010-02-15 Thread Stewart Smith
So... I sketched this out in my head at LCA... and it's taken a bit of
time to actually properly try it.

The problem is:
A simple 'find ~/Maildir` takes 10 minutes, and if you write the
output to a file, it's 88MB+

there's "only" about 900,000 entries there. But this means 900,000
files, which is a non-trivial amount. Some mail folders are quite
large too.

Some of this problem could just be solved by using notmuch a bit
differently (folder per month for example).

However... this is a one-way change and going back would be very
tricky.

There's also the backup problem. Iterating through ~1million inodes
takes a *LONG* time. Restoring it takes even longer (think about
writing all that data to the file system journal).

Historically, if i'm running a backup, I couldn't really use my
laptop, it'd be saturated with disk IO performing the file system
dump. It would also take many hours.

Restoring from backup? about 8hrs.

An observation is that mail never changes. It may be reclassified (and
that's what notmuch is for), but it never changes.

We really just want a way to store and access many many many small
blobs of data that never change.

It turns out git is pretty good at that. Underneath, we could just use
it as an object store (a simple git-hash-object and git-cat-file test
confirmed this to be pretty simple to do). even better is since a lot
of mail is fairly similar, to use delta compression between mail
messages to reduce the storage space. Git is pretty good at that too.

A few giant git packs will be much quicker to backup and restore than
1million files.

So... I wrote a script to test it

$ time perl /home/stewart/evenless.pl /home/stewart/Maildir/

real841m41.491s
user491m3.200s
sys 261m58.080s

Which goes from a 15GB Maildir to a 3.7GB git repo.

The algorithm of evenless.pl is basically:
1 get next directory entry
2 if is directory, recurse into it
3 write item to git (git hash-object -w)
4 add item to tree object
5 if number of items written = 1000
  5.1 make pack of last 1000 items
6 goto 1

$ git count-objects -v
count: 479
size: 27680
in-pack: 873109
packs: 1084
size-pack: 3746219
prune-packable: 0
garbage: 0

If i did a "git checkout", about 8 hours later i'd have a directory
tree exactly the same as my maildir.

Why didn't I just git-add everything? I didn't exactly feel like
creating another giant copy of my mail (that also takes a long time).

What about adding more mail to the archive?

So the way I think is that you use a Maildir for day to day mail (e.g.
delivery) and every so often you run some magic command that takes old
mail out of the Maildir and stores it in the git repo.

Next step?

Make notmuch be able to read mail out of it and add it to an index
(oh, and some kind of verification and error checking about creating
the git repo).
-- 
Stewart Smith


Re: [notmuch] [PATCH] notmuch: Respect maildir message flags

2010-02-15 Thread Stewart Smith
On Wed, Feb 10, 2010 at 01:43:39PM +1030, Tim Stoakes wrote:
 My apologies for dredging up an old thread. I don't want to restart the
 religious war over whether notmuch should respect Maildir flags -
 suffice to say that *I* want that, and the patch posted by Michiel
 seemed to be the best way to make that happen.

I want this too :)

I also found a bug

 @@ -301,6 +357,28 @@ add_files_recursive (notmuch_database_t *notmuch,
   continue;
   }
  
 + /* If this directory is a Maildir folder, we need to
 +  * ignore any subdirectories marked tmp/, and scan for
 +  * Maildir attributes on messages contained in the sub-
 +  * directories 'new' and 'cur'. */
 + if (maildir_detected != 0 
 + entry-d_type == DT_DIR 
 + ((strcmp (entry-d_name, tmp) == 0) ||
 +  (strcmp (entry-d_name, new) == 0) ||
 +  (strcmp (entry-d_name, cur) == 0))) {

should be
(entry-d_type == DT_DIR || entry-d_type == DT_UNKNOWN) 

as not everywhere is going to give you d_type (e.g. my machine).


(took me a while to find/figure that out :) 
-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH] notmuch: Respect maildir message flags

2010-02-15 Thread Stewart Smith
New patch that does it. Pretty much same as the old one, just with
that one bug I mentioned fixed. This is what I've currently used to
import my Maildir. I'm now happy :)

diff --git a/notmuch-new.c b/notmuch-new.c
index f25c71f..43371a3 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -39,6 +39,7 @@ typedef struct {
 int total_files;
 int processed_files;
 int added_messages;
+int tag_maildir;
 struct timeval tv_start;
 
 _filename_list_t *removed_files;
@@ -169,6 +170,60 @@ _entries_resemble_maildir (struct dirent **entries, int 
count)
 return 0;
 }
 
+/* Tag new mail according to its Maildir attribute flags.
+ *
+ * Test if the mail file's filename contains any of the
+ * standard Maildir attributes, and translate these to
+ * the corresponding standard notmuch tags.
+ *
+ * If the message is not marked as 'seen', or if no
+ * flags are present, tag as 'inbox, unread'.
+ */
+static void
+derive_tags_from_maildir_flags (notmuch_message_t *message,
+   const char * path)
+{
+int seen = FALSE;
+int end_of_flags = FALSE;
+size_t l = strlen(path);
+
+/* Non-experimental message flags start with this */
+char * i = strstr(path, :2,);
+i = (i) ? i : strstr(path, !2,); /* This format is used on VFAT */
+if (i != NULL) {
+   i += 3;
+   for (; i  (path + l)  !end_of_flags; i++) {
+   switch (*i) {
+   case 'F' :
+   notmuch_message_add_tag (message, flagged);
+   break;
+   case 'R': /* replied */
+   notmuch_message_add_tag (message, answered);
+   break;
+   case 'D':
+   notmuch_message_add_tag (message, draft);
+   break;
+   case 'S': /* seen */
+   seen = TRUE;
+   break;
+   case 'T': /* trashed */
+   notmuch_message_add_tag (message, deleted);
+   break;
+   case 'P': /* passed */
+   notmuch_message_add_tag (message, forwarded);
+   break;
+   default:
+   end_of_flags = TRUE;
+   break;
+   }
+   }
+}
+
+if (i == NULL || !seen) {
+   tag_inbox_and_unread (message);
+}
+}
+
 /* Examine 'path' recursively as follows:
  *
  *   o Ask the filesystem for the mtime of 'path' (fs_mtime)
@@ -222,6 +277,7 @@ add_files_recursive (notmuch_database_t *notmuch,
 notmuch_filenames_t *db_subdirs = NULL;
 struct stat st;
 notmuch_bool_t is_maildir, new_directory;
+int maildir_detected = -1;
 
 if (stat (path, st)) {
fprintf (stderr, Error reading directory %s: %s\n,
@@ -301,6 +357,28 @@ add_files_recursive (notmuch_database_t *notmuch,
continue;
}
 
+   /* If this directory is a Maildir folder, we need to
+* ignore any subdirectories marked tmp/, and scan for
+* Maildir attributes on messages contained in the sub-
+* directories 'new' and 'cur'. */
+   if (maildir_detected != 0 
+   (entry-d_type == DT_DIR || entry-d_type == DT_UNKNOWN) 
+   ((strcmp (entry-d_name, tmp) == 0) ||
+(strcmp (entry-d_name, new) == 0) ||
+(strcmp (entry-d_name, cur) == 0))) {
+
+if (maildir_detected == -1) {
+  maildir_detected = _entries_resemble_maildir(fs_entries, num_fs_entries);
+}
+if (maildir_detected == 1) {
+  if (strcmp (entry-d_name, tmp) == 0) {
+continue;
+  } else {
+state-tag_maildir = TRUE;
+  }
+}
+  }
+
next = talloc_asprintf (notmuch, %s/%s, path, entry-d_name);
status = add_files_recursive (notmuch, next, state);
if (status  ret == NOTMUCH_STATUS_SUCCESS)
@@ -412,7 +490,12 @@ add_files_recursive (notmuch_database_t *notmuch,
/* success */
case NOTMUCH_STATUS_SUCCESS:
state-added_messages++;
-   tag_inbox_and_unread (message);
+   if (state-tag_maildir) {
+   derive_tags_from_maildir_flags (message,
+   entry-d_name);
+   } else {
+   tag_inbox_and_unread (message);
+   }
break;
/* Non-fatal issues (go on to next file) */
case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID:


-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH] notmuch: Respect maildir message flags

2010-02-15 Thread Stewart Smith
On Tue, Feb 16, 2010 at 03:12:50PM +1300, martin f krafft wrote:
 also sprach Stewart Smith stew...@flamingspork.com [2010.02.16.1458 +1300]:
  +   case 'R': /* replied */
  +   notmuch_message_add_tag (message, answered);
  +   break;
 
 'r' means replied, not 'answered'.

fixed.

(i have to admit... i didn't look too closely at this... it just
worked enough for me)

 
  +   case 'T': /* trashed */
  +   notmuch_message_add_tag (message, deleted);
  +   break;
 
 Same. trashed and deleted are not the same thing.

changed to 'trashed'.

 I don't want to get into an argument over this, because I think this
 already exposes a problem: you are putting into global namespace
 something not everyone might want, or agree with.
 
 Why not use 'maildirflags::replied' instead? People can always map
 that to something in the global namespace.

What about putting them all in there except for the seen tag, with the
seen tag dictating if it gets marked 'unread' or not? I cannot imagine
where somebody would want this not to be the case... it was bad enough
discovering 100,000 unread messages :)

What about this patch (just with those few things fixed)?

diff --git a/notmuch-new.c b/notmuch-new.c
index f25c71f..8303047 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -39,6 +39,7 @@ typedef struct {
 int total_files;
 int processed_files;
 int added_messages;
+int tag_maildir;
 struct timeval tv_start;
 
 _filename_list_t *removed_files;
@@ -169,6 +170,60 @@ _entries_resemble_maildir (struct dirent **entries, int 
count)
 return 0;
 }
 
+/* Tag new mail according to its Maildir attribute flags.
+ *
+ * Test if the mail file's filename contains any of the
+ * standard Maildir attributes, and translate these to
+ * the corresponding standard notmuch tags.
+ *
+ * If the message is not marked as 'seen', or if no
+ * flags are present, tag as 'inbox, unread'.
+ */
+static void
+derive_tags_from_maildir_flags (notmuch_message_t *message,
+   const char * path)
+{
+int seen = FALSE;
+int end_of_flags = FALSE;
+size_t l = strlen(path);
+
+/* Non-experimental message flags start with this */
+char * i = strstr(path, :2,);
+i = (i) ? i : strstr(path, !2,); /* This format is used on VFAT */
+if (i != NULL) {
+   i += 3;
+   for (; i  (path + l)  !end_of_flags; i++) {
+   switch (*i) {
+   case 'F' :
+   notmuch_message_add_tag (message, maildir::flagged);
+   break;
+   case 'R': /* replied */
+   notmuch_message_add_tag (message, maildir::replied);
+   break;
+   case 'D':
+   notmuch_message_add_tag (message, maildir::draft);
+   break;
+   case 'S': /* seen */
+   seen = TRUE;
+   break;
+   case 'T': /* trashed */
+   notmuch_message_add_tag (message, maildir::trashed);
+   break;
+   case 'P': /* passed */
+   notmuch_message_add_tag (message, maildir::forwarded);
+   break;
+   default:
+   end_of_flags = TRUE;
+   break;
+   }
+   }
+}
+
+if (i == NULL || !seen) {
+   tag_inbox_and_unread (message);
+}
+}
+
 /* Examine 'path' recursively as follows:
  *
  *   o Ask the filesystem for the mtime of 'path' (fs_mtime)
@@ -222,6 +277,7 @@ add_files_recursive (notmuch_database_t *notmuch,
 notmuch_filenames_t *db_subdirs = NULL;
 struct stat st;
 notmuch_bool_t is_maildir, new_directory;
+int maildir_detected = -1;
 
 if (stat (path, st)) {
fprintf (stderr, Error reading directory %s: %s\n,
@@ -301,6 +357,28 @@ add_files_recursive (notmuch_database_t *notmuch,
continue;
}
 
+   /* If this directory is a Maildir folder, we need to
+* ignore any subdirectories marked tmp/, and scan for
+* Maildir attributes on messages contained in the sub-
+* directories 'new' and 'cur'. */
+   if (maildir_detected != 0 
+   (entry-d_type == DT_DIR || entry-d_type == DT_UNKNOWN) 
+   ((strcmp (entry-d_name, tmp) == 0) ||
+(strcmp (entry-d_name, new) == 0) ||
+(strcmp (entry-d_name, cur) == 0))) {
+
+if (maildir_detected == -1) {
+  maildir_detected = _entries_resemble_maildir(fs_entries, num_fs_entries);
+}
+if (maildir_detected == 1) {
+  if (strcmp (entry-d_name, tmp) == 0) {
+continue;
+  } else {
+state-tag_maildir = TRUE;
+  }
+}
+  }
+
next = talloc_asprintf (notmuch, %s/%s, path, entry-d_name);
status = add_files_recursive (notmuch, next, state);
if (status  ret == NOTMUCH_STATUS_SUCCESS)
@@ -412,7 +490,12 @@ add_files_recursive (notmuch_database_t *notmuch,
/* success */
case NOTMUCH_STATUS_SUCCESS:
state-added_messages

Re: [notmuch] Notmuch performance problems on OSX

2010-02-15 Thread Stewart Smith
On Fri, 15 Jan 2010 03:58:50 + (UTC), Olly Betts o...@survex.com wrote:
 One difference between OS X and other systems is that OS X supports the
 F_FULLSYNC ioctl, and other systems don't (currently, at least AFAIK)
 and Xapian uses that if it is available to ensure that changes have
 actually made it to disk:
 
 http://trac.xapian.org/ticket/288
 
 On other systems, it uses fdatasync() or fsync(), which typically just
 ensure that the data has left the OS - it can sit in disk controller or
 drive caches for potentially seconds longer.  This call happens once
 per table for every (explicit or implicit) flush on a database.

At least if you OS and file system don't hate you (e.g. XFS on Linux),
then fsync() really does flush the drive cache.

Also keep in mind that the OSX file system (HFS+) was great for
1985. It's essentially single threaded :/

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] Mail in git

2010-02-14 Thread Stewart Smith
So... I sketched this out in my head at LCA... and it's taken a bit of
time to actually properly try it.

The problem is:
A simple 'find ~/Maildir` takes 10 minutes, and if you write the
output to a file, it's 88MB+

there's only about 900,000 entries there. But this means 900,000
files, which is a non-trivial amount. Some mail folders are quite
large too.

Some of this problem could just be solved by using notmuch a bit
differently (folder per month for example).

However... this is a one-way change and going back would be very
tricky.

There's also the backup problem. Iterating through ~1million inodes
takes a *LONG* time. Restoring it takes even longer (think about
writing all that data to the file system journal).

Historically, if i'm running a backup, I couldn't really use my
laptop, it'd be saturated with disk IO performing the file system
dump. It would also take many hours.

Restoring from backup? about 8hrs.

An observation is that mail never changes. It may be reclassified (and
that's what notmuch is for), but it never changes.

We really just want a way to store and access many many many small
blobs of data that never change.

It turns out git is pretty good at that. Underneath, we could just use
it as an object store (a simple git-hash-object and git-cat-file test
confirmed this to be pretty simple to do). even better is since a lot
of mail is fairly similar, to use delta compression between mail
messages to reduce the storage space. Git is pretty good at that too.

A few giant git packs will be much quicker to backup and restore than
1million files.

So... I wrote a script to test it

$ time perl /home/stewart/evenless.pl /home/stewart/Maildir/

real841m41.491s
user491m3.200s
sys 261m58.080s

Which goes from a 15GB Maildir to a 3.7GB git repo.

The algorithm of evenless.pl is basically:
1 get next directory entry
2 if is directory, recurse into it
3 write item to git (git hash-object -w)
4 add item to tree object
5 if number of items written = 1000
  5.1 make pack of last 1000 items
6 goto 1

$ git count-objects -v
count: 479
size: 27680
in-pack: 873109
packs: 1084
size-pack: 3746219
prune-packable: 0
garbage: 0

If i did a git checkout, about 8 hours later i'd have a directory
tree exactly the same as my maildir.

Why didn't I just git-add everything? I didn't exactly feel like
creating another giant copy of my mail (that also takes a long time).

What about adding more mail to the archive?

So the way I think is that you use a Maildir for day to day mail (e.g.
delivery) and every so often you run some magic command that takes old
mail out of the Maildir and stores it in the git repo.

Next step?

Make notmuch be able to read mail out of it and add it to an index
(oh, and some kind of verification and error checking about creating
the git repo).
-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] Git as notmuch object store (was: Potential problem using Git for mail)

2010-02-14 Thread Stewart Smith
On Mon, Jan 25, 2010 at 01:46:59PM +1300, martin f krafft wrote:
 Stewart, you've worked most on this so far. Would you like to share
 your thoughts?

Just posted a new thread with my latest experiments. Things look
rather good from a storage size point of view. Still a few things to
work out though.

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] Mac OS X/Darwin compatibility issues

2009-11-19 Thread Stewart Smith
On Wed, Nov 18, 2009 at 04:24:42PM -0800, Alexander Botero-Lowry wrote:
> On Thu, 19 Nov 2009 10:45:28 +1100, Stewart Smith  flamingspork.com> wrote:
> > On Wed, Nov 18, 2009 at 11:27:20PM +0100, Carl Worth wrote:
> > > Yes. I knew I was "cheating" by using some GNU extensions here. I'm
> > > happy to accept portability patches for these things, but it's hard for
> > > me to get excited about writing them myself.
> > > 
> > > Care to take a whack at these?
> > 
> > http://www.gnu.org/software/gnulib/
> > 
> > could be a partial answer.
> > 
> Why add yet another dependency for a couple of functions? Especially
> considering how notmuch already depends on glib which includes portability
> functions for various things.

The idea with gnulib (at least what we've done with drizzle) is to
just copy the bits you need into the tree. Does work pretty well for
those small things that you just don't need to depend on a giant like
glib for.
-- 
Stewart Smith


[notmuch] Mac OS X/Darwin compatibility issues

2009-11-19 Thread Stewart Smith
On Wed, Nov 18, 2009 at 11:27:20PM +0100, Carl Worth wrote:
> Yes. I knew I was "cheating" by using some GNU extensions here. I'm
> happy to accept portability patches for these things, but it's hard for
> me to get excited about writing them myself.
> 
> Care to take a whack at these?

http://www.gnu.org/software/gnulib/

could be a partial answer.

We've taken to using it where needed for Drizzle and seems to work fine.
-- 
Stewart Smith


[notmuch] [PATCH] count_files: sort directory in inode order before statting

2009-11-18 Thread Stewart Smith
---
 notmuch-new.c |   30 ++
 1 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/notmuch-new.c b/notmuch-new.c
index 11fad8c..c5f841a 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -308,36 +308,26 @@ add_files (notmuch_database_t *notmuch,
 static void
 count_files (const char *path, int *count)
 {
-DIR *dir;
-struct dirent *e, *entry = NULL;
-int entry_length;
-int err;
+struct dirent *entry = NULL;
 char *next;
 struct stat st;
+struct dirent **namelist = NULL;

-dir = opendir (path);
+int n_entries= scandir(path, , 0, ino_cmp);

-if (dir == NULL) {
+if (n_entries == -1) {
fprintf (stderr, "Warning: failed to open directory %s: %s\n",
 path, strerror (errno));
goto DONE;
 }

-entry_length = offsetof (struct dirent, d_name) +
-   pathconf (path, _PC_NAME_MAX) + 1;
-entry = malloc (entry_length);
+int i=0;

 while (!interrupted) {
-   err = readdir_r (dir, entry, );
-   if (err) {
-   fprintf (stderr, "Error reading directory: %s\n",
-strerror (errno));
-   free (entry);
-   goto DONE;
-   }
+if (i == n_entries)
+break;

-   if (e == NULL)
-   break;
+entry= namelist[i++];

/* Ignore special directories to avoid infinite recursion.
 * Also ignore the .notmuch directory.
@@ -376,8 +366,8 @@ count_files (const char *path, int *count)
   DONE:
 if (entry)
free (entry);
-
-closedir (dir);
+if (namelist)
+free (namelist);
 }

 int
-- 
1.6.3.3



[notmuch] [PATCH 2/2] Read mail directory in inode number order

2009-11-18 Thread Stewart Smith
This gives a rather decent reduction in number of seeks required when
reading a Maildir that isn't in pagecache.

Most filesystems give some locality on disk based on inode numbers.
In ext[234] this is the inode tables, in XFS groups of sequential inode
numbers are together on disk and the most significant bits indicate
allocation group (i.e inode 1,000,000 is always after inode 1,000).

With this patch, we read in the whole directory, sort by inode number
before stat()ing the contents.

Ideally, directory is sequential and then we make one scan through the
file system stat()ing.

Since the universe is not ideal, we'll probably seek during reading the
directory and a fair bit while reading the inodes themselves.

However... with readahead, and stat()ing in inode order, we should be
in the best place possible to hit the cache.

In a (not very good) benchmark of "how long does it take to find the first
15,000 messages in my Maildir after 'echo 3 > /proc/sys/vm/drop_caches'",
this patch consistently cut at least 8 seconds off the scan time.

Without patch: 50 seconds
With patch: 38-42 seconds.

(I did this in a previous maildir reading project and saw large improvements 
too)
---
 notmuch-new.c |   32 +++-
 1 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/notmuch-new.c b/notmuch-new.c
index 83a05ba..11fad8c 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -73,6 +73,11 @@ add_files_print_progress (add_files_state_t *state)
 fflush (stdout);
 }

+static int ino_cmp(const struct dirent **a, const struct dirent **b)
+{
+  return ((*a)->d_ino < (*b)->d_ino)? -1: 1;
+}
+
 /* Examine 'path' recursively as follows:
  *
  *   o Ask the filesystem for the mtime of 'path' (path_mtime)
@@ -100,13 +105,12 @@ add_files_recursive (notmuch_database_t *notmuch,
 add_files_state_t *state)
 {
 DIR *dir = NULL;
-struct dirent *e, *entry = NULL;
-int entry_length;
-int err;
+struct dirent *entry = NULL;
 char *next = NULL;
 time_t path_mtime, path_dbtime;
 notmuch_status_t status, ret = NOTMUCH_STATUS_SUCCESS;
 notmuch_message_t *message = NULL;
+struct dirent **namelist = NULL;

 /* If we're told to, we bail out on encountering a read-only
  * directory, (with this being a clear clue from the user to
@@ -122,31 +126,23 @@ add_files_recursive (notmuch_database_t *notmuch,
 path_mtime = st->st_mtime;

 path_dbtime = notmuch_database_get_timestamp (notmuch, path);
+int n_entries= scandir(path, , 0, ino_cmp);

-dir = opendir (path);
-if (dir == NULL) {
+if (n_entries == -1) {
fprintf (stderr, "Error opening directory %s: %s\n",
 path, strerror (errno));
ret = NOTMUCH_STATUS_FILE_ERROR;
goto DONE;
 }

-entry_length = offsetof (struct dirent, d_name) +
-   pathconf (path, _PC_NAME_MAX) + 1;
-entry = malloc (entry_length);
+int i=0;

 while (!interrupted) {
-   err = readdir_r (dir, entry, );
-   if (err) {
-   fprintf (stderr, "Error reading directory: %s\n",
-strerror (errno));
-   ret = NOTMUCH_STATUS_FILE_ERROR;
-   goto DONE;
-   }
-
-   if (e == NULL)
+   if (i == n_entries)
break;

+entry= namelist[i++];
+
/* If this directory hasn't been modified since the last
 * add_files, then we only need to look further for
 * sub-directories. */
@@ -243,6 +239,8 @@ add_files_recursive (notmuch_database_t *notmuch,
free (entry);
 if (dir)
closedir (dir);
+if (namelist)
+   free (namelist);

 return ret;
 }
-- 
1.6.3.3



[notmuch] [PATCH] Fix linking with gcc to use g++ to link in C++ libs.

2009-11-18 Thread Stewart Smith
Previously, Ubuntu 9.10, gcc 4.4.1 was getting:

ccache gcc `pkg-config --libs glib-2.0 gmime-2.4 talloc` `xapian-config --libs` 
notmuch.o notmuch-config.o notmuch-dump.o notmuch-new.o notmuch-reply.o 
notmuch-restore.o notmuch-search.o notmuch-setup.o notmuch-show.o notmuch-tag.o 
notmuch-time.o gmime-filter-reply.o query-string.o show-message.o lib/notmuch.a 
-o notmuch
/usr/bin/ld: lib/notmuch.a(database.o): in function global constructors keyed 
to BOOLEAN_PREFIX_INTERNAL:database.cc(.text+0x3a): error: undefined reference 
to 'std::ios_base::Init::Init()'
---
 Makefile.local |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Makefile.local b/Makefile.local
index f824bed..dbd3e20 100644
--- a/Makefile.local
+++ b/Makefile.local
@@ -18,7 +18,7 @@ notmuch_client_srcs = \

 notmuch_client_modules = $(notmuch_client_srcs:.c=.o)
 notmuch: $(notmuch_client_modules) lib/notmuch.a
-   $(CC) $(LDFLAGS) $^ -o $@
+   $(CXX) $(LDFLAGS) $^ -o $@

 notmuch.1.gz:
gzip --stdout notmuch.1 > notmuch.1.gz
-- 
1.6.3.3