Re: Notmuch indexing 21 million emails

2011-11-24 Thread Felipe Contreras
On Tue, Nov 22, 2011 at 5:02 AM, Tom Bulli mrbu...@yahoo.com wrote:
 I have a project where I need to search about 21 emails - and decided to use 
 notmuch for it.  The system is a Debian Squeeze, the notmuch version is 
 0.8-1~bpo60+1 from kyria's private repository.

 I am running the notmuch new for approx. 4 days now - and according to 
 not,uch count it has indexed about 4.5 million emails.

 Is this expected performance?  Is there any way to speed that up?

It would be nice to run something like this with OProfile (or perf)
and see if there's some obvious fixes.

-- 
Felipe Contreras
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Notmuch indexing 21 million emails

2011-11-22 Thread Austin Clements
Quoth Tom Bulli on Nov 21 at  7:02 pm:
> I have a project where I need to search about 21 emails - and
> decided to use "notmuch" for it.? The system is a Debian Squeeze,
> the notmuch version is "0.8-1~bpo60+1" from "kyria's" private
> repository.
> 
> I am running the "notmuch new" for approx. 4 days now - and
> according to "not,uch count" it has indexed about 4.5 million
> emails.
> 
> Is this expected performance?? Is there any way to speed that up?

Currently, notmuch is much more optimized for search than it is for
indexing.  This is unfortunate for the initial indexing process and
seems to be becoming increasingly unfortunate.

There are some things you can try.  One is to use an SSD if you aren't
already, since constructing the index requires a lot of random IO.
You can also try libeatmydata to disable fsync's, which may improve
your IO performance, with the obvious crash-safety caveats.  However,
unless you have a lot of RAM, I suspect your index has long outgrown
your buffer cache, so this may have limited impact.

Since you're going to the trouble of indexing 21 million emails, you
might want to try 0.10 (under freeze right now, to be released very,
very soon).  It won't improve your indexing time, but if you're doing
searches with non-trivial numbers of results, emails indexed with 0.10
will search much faster.

Sorry I don't have better news, but I hope this helps.


Re: Notmuch indexing 21 million emails

2011-11-22 Thread Austin Clements
Quoth Tom Bulli on Nov 21 at  7:02 pm:
 I have a project where I need to search about 21 emails - and
 decided to use notmuch for it.  The system is a Debian Squeeze,
 the notmuch version is 0.8-1~bpo60+1 from kyria's private
 repository.
 
 I am running the notmuch new for approx. 4 days now - and
 according to not,uch count it has indexed about 4.5 million
 emails.
 
 Is this expected performance?  Is there any way to speed that up?

Currently, notmuch is much more optimized for search than it is for
indexing.  This is unfortunate for the initial indexing process and
seems to be becoming increasingly unfortunate.

There are some things you can try.  One is to use an SSD if you aren't
already, since constructing the index requires a lot of random IO.
You can also try libeatmydata to disable fsync's, which may improve
your IO performance, with the obvious crash-safety caveats.  However,
unless you have a lot of RAM, I suspect your index has long outgrown
your buffer cache, so this may have limited impact.

Since you're going to the trouble of indexing 21 million emails, you
might want to try 0.10 (under freeze right now, to be released very,
very soon).  It won't improve your indexing time, but if you're doing
searches with non-trivial numbers of results, emails indexed with 0.10
will search much faster.

Sorry I don't have better news, but I hope this helps.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Notmuch indexing 21 million emails

2011-11-21 Thread Tom Bulli
I have a project where I need to search about 21 emails - and decided to use 
"notmuch" for it.? The system is a Debian Squeeze, the notmuch version is 
"0.8-1~bpo60+1" from "kyria's" private repository.

I am running the "notmuch new" for approx. 4 days now - and according to 
"not,uch count" it has indexed about 4.5 million emails.

Is this expected performance?? Is there any way to speed that up?