Alternative (raw) message store (i.e. instead of maildir)

2012-08-15 Thread Stewart Smith
Vladimir Marek  writes:
> Well, if your granularity will be one archive per year of mail, it
> should not be that bad ...

Except for someone like Keith, who has all his email since sometime in
the 80s or something insane like that :)

-- 
Stewart Smith
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: 



Alternative (raw) message store (i.e. instead of maildir)

2012-08-14 Thread Christophe-Marie Duquesne
On Tue, Aug 14, 2012 at 8:11 PM, Christophe-Marie Duquesne  
wrote:
> one could complete this work with an
> interface to couchdb for offlineimap

*I meant for notmuch


Alternative (raw) message store (i.e. instead of maildir)

2012-08-14 Thread Christophe-Marie Duquesne
On Tue, Aug 14, 2012 at 7:05 PM, Ciprian Dorin Craciun
 wrote:
> I proposed -- better said queried if possible or at least wanted
> -- to have an internal interface (SPI) that any mail store would have
> to implement in order to be indexed and used by notmuch. I guess the
> interface would be quite lightweight, and would need just the
> following:
> * open store;
> * create a cursor iterating through all the emails, yielding only the 
> keys;
> * read the envelope (as a byte blob) of a particular key; (used
> only for displaying thread lists, etc.;)
> * read the body (as a byte blob) of a particular key;
> * maybe create a cursor iterating over all those emails that have
> changed since a particular timestamp;

Someone wrote a fork of offlineimap to store mail in couchdb [1]. The
same couchdb can be mounted with fuse as a maildir [2] for mutt.
According to the author [3], the fuse interface is read only. Assuming
your proposal was implemented, one could complete this work with an
interface to couchdb for offlineimap and get all the features
previously requested.

[1]: https://github.com/theodoreb/offlineimap
[2]: https://github.com/theodoreb/couchdb-maildir-fuse
[3]: http://theodoreb.net/resume


Alternative (raw) message store (i.e. instead of maildir)

2012-08-14 Thread Stewart Smith
Vladimir Marek  writes:
> Hi,
>
> I have objections against maildir too, but I tried to tackle it from
> different perspective. Store the maildir in zip file and use fuse-zip to
> manage it. It works sort of but it has two major disadvantages:

huh... this is fairly interesting one of the downsides of a million
odd files for mail is that filesystem dump and restore takes a *LOT*
longer than if it's just giant files on disk. Combined with afuse (fuse
automounter) this could be a pretty elegant solution to the problem of
storing archival Maildirs.

One large archival maildir here went from 6.5GB (du -sh on XFS) to a
2.3GB ZIP archive that will never, ever change. Think about the
performance difference between creating 560,000 files for backup/restore
versus copying a single 2.3GB file.

>  - fuse zip stores all changes in memory until unmounted
>  - fuse zip (and libzip for that matter) creates new temporary file when
>updating archive, which takes considerable time when the archive is
>very big.

This isn't much of a hastle if you have maildir per time period and
archive off. Maybe if you sync flags it may be...

> Of course this solution would have some disadvantages too, but for me
> the advantages would win. At the moment I'm not sure if I want to
> continue working on that. Maybe if there would be more interested guys

I'm *really* tempted to investigate making this work for archived
mail. Of course, the list of mounted file systems could get insane
depending on granularity I guess...

-- 
Stewart Smith
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: 



Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-14 Thread Christophe-Marie Duquesne
On Tue, Aug 14, 2012 at 7:05 PM, Ciprian Dorin Craciun
ciprian.crac...@gmail.com wrote:
 I proposed -- better said queried if possible or at least wanted
 -- to have an internal interface (SPI) that any mail store would have
 to implement in order to be indexed and used by notmuch. I guess the
 interface would be quite lightweight, and would need just the
 following:
 * open store;
 * create a cursor iterating through all the emails, yielding only the 
 keys;
 * read the envelope (as a byte blob) of a particular key; (used
 only for displaying thread lists, etc.;)
 * read the body (as a byte blob) of a particular key;
 * maybe create a cursor iterating over all those emails that have
 changed since a particular timestamp;

Someone wrote a fork of offlineimap to store mail in couchdb [1]. The
same couchdb can be mounted with fuse as a maildir [2] for mutt.
According to the author [3], the fuse interface is read only. Assuming
your proposal was implemented, one could complete this work with an
interface to couchdb for offlineimap and get all the features
previously requested.

[1]: https://github.com/theodoreb/offlineimap
[2]: https://github.com/theodoreb/couchdb-maildir-fuse
[3]: http://theodoreb.net/resume
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-14 Thread Christophe-Marie Duquesne
On Tue, Aug 14, 2012 at 8:11 PM, Christophe-Marie Duquesne c...@chmd.fr wrote:
 one could complete this work with an
 interface to couchdb for offlineimap

*I meant for notmuch
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-14 Thread Stewart Smith
Vladimir Marek vladimir.ma...@oracle.com writes:
 Well, if your granularity will be one archive per year of mail, it
 should not be that bad ...

Except for someone like Keith, who has all his email since sometime in
the 80s or something insane like that :)

-- 
Stewart Smith


pgpqbDWUxd3Kw.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-13 Thread Ciprian Dorin Craciun
On Sat, Aug 11, 2012 at 11:50 PM, Jameson Graef Rollins
jroll...@finestructure.net wrote:
 On Sat, Aug 11 2012, Ciprian Dorin Craciun ciprian.crac...@gmail.com wrote:
 My problem with it is that it doesn't scale... And I don't mean
 this in a theoretical sense, I mean it in the concrete one: I have
 about 661k emails... And a single `notmuch sync` takes a few tens of
 seconds...

 Hey, Ciprian.  That sounds really slow, which makes me wonder if there
 are other things going on here.
 I have 155k messages, but notmuch new
 takes a fraction of a second for me.  This initial indexing certainly
 takes a long time (hours potentially), but additions after that should
 be really fast.  What version of notmuch are you using?  What version of
 xapian?


Don't think there is anything wrong here... Its just drags with
the file system...

So just to give a complete info:
* hardware: Core i5, 8GiB RAM (7.5GiB of which is the FS cache),
SSD (about 175MiB raw disk access);
* `notmuch --version`: 0.13 (built from sources on latest ArchLinux);
* `notmuch count`: 701820;
* `notmuch new` (after adding 5925 new emails, at touching others):

Processed 7017 total files in 3m 19s (35 files/sec.).
Added 6061 new messages to the database. Detected 1116 file renames.

* actually the entire thing took almost 5 minutes, but the first
two it didn't display anything just acesing the disk;
* `notmuch new` (another go, but this time I've `time`-d it):

No new mail.
real0m40.546s
user0m4.523s
sys 0m17.506s

* `notmuch new` (yet another go, no change):

No new mail.
real0m39.190s
user0m4.229s
sys 0m17.697s

* just to `du` the maildir (there are also 40k other files in
other maildirs not included in this count):

8.7G..
real0m22.229s
user0m1.023s
sys 0m7.890s

* on `new` no hooks are run;
* the file system in cause is JFS;


As such I doubt the problem is with notmuch itself, and I guess
it's the file system interaction...

Now I know I have a really obscure corner case, and I'm positively
amazed on how good notmuch handles this situation. I just wandered if
I could have fixed my problem by moving to an embedded DB, thus
skipping all that syscall overhead...

Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-13 Thread Stewart Smith
Vladimir Marek vladimir.ma...@oracle.com writes:
 Hi,

 I have objections against maildir too, but I tried to tackle it from
 different perspective. Store the maildir in zip file and use fuse-zip to
 manage it. It works sort of but it has two major disadvantages:

huh... this is fairly interesting one of the downsides of a million
odd files for mail is that filesystem dump and restore takes a *LOT*
longer than if it's just giant files on disk. Combined with afuse (fuse
automounter) this could be a pretty elegant solution to the problem of
storing archival Maildirs.

One large archival maildir here went from 6.5GB (du -sh on XFS) to a
2.3GB ZIP archive that will never, ever change. Think about the
performance difference between creating 560,000 files for backup/restore
versus copying a single 2.3GB file.

  - fuse zip stores all changes in memory until unmounted
  - fuse zip (and libzip for that matter) creates new temporary file when
updating archive, which takes considerable time when the archive is
very big.

This isn't much of a hastle if you have maildir per time period and
archive off. Maybe if you sync flags it may be...

 Of course this solution would have some disadvantages too, but for me
 the advantages would win. At the moment I'm not sure if I want to
 continue working on that. Maybe if there would be more interested guys

I'm *really* tempted to investigate making this work for archived
mail. Of course, the list of mounted file systems could get insane
depending on granularity I guess...

-- 
Stewart Smith


pgpZcxW0PhtqJ.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread David Bremner
Ciprian Dorin Craciun  writes:

> My question -- rather a curiosity -- is if one could easily
> implement an alternative message store instead of maildir. (I actuall
y
> have in mind a KV store like BerkeleyDB, or even a database like
> CouchDB...)

See 

id:"1340657517-6539-6-git-send-email-ethan at betacantrips.com"

for one proposal. And yes, it touches quite a lot of code.

d


Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread Jameson Graef Rollins
On Sat, Aug 11 2012, Ciprian Dorin Craciun  wrote:
> My problem with it is that it doesn't scale... And I don't mean
> this in a theoretical sense, I mean it in the concrete one: I have
> about 661k emails... And a single `notmuch sync` takes a few tens of
> seconds...

Hey, Ciprian.  That sounds really slow, which makes me wonder if there
are other things going on here.  I have 155k messages, but notmuch new
takes a fraction of a second for me.  This initial indexing certainly
takes a long time (hours potentially), but additions after that should
be really fast.  What version of notmuch are you using?  What version of
xapian?

jamie.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: 



Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread Ciprian Dorin Craciun
On Sat, Aug 11, 2012 at 12:46 PM, Vladimir Marek
 wrote:
> Hi,
>
> I have objections against maildir too,

Just for the record I have nothing against maildir (or at least
when compared to mbox format). On the contrary I find it quite easy to
fiddle with...

My problem with it is that it doesn't scale... And I don't mean
this in a theoretical sense, I mean it in the concrete one: I have
about 661k emails... And a single `notmuch sync` takes a few tens of
seconds...

(Of course my problem could be partially solved by moving to a
fanout maildir folder, i.e. multiple maildirs. But this doesn't solve
the scalability it just delays the problem...)


> but I tried to tackle it from
> different perspective. Store the maildir in zip file and use fuse-zip to
> manage it. It works sort of but it has two major disadvantages:

I also thought of using either FUSE or 9p for this. Unfortunately
it doesn't quite solve my issue as seen above...


Now about other hacks to my problem:
* I'm aware that I can feed notmuch with individual file paths to
be indexed, but it still needs a path where to find an email;
* use the before mentioned fanout solution;
* others?

But regardless, having 600k emails on my disk (currently in the
same folder) is insane... Moreover I would have loved to be able to
use some Git plumbing as a store, or maybe CouchDB, etc...

Ciprian.


Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread m...@pels.in
How about implementing MIX[1] 

(and yes, i am totally ignorant about the format, i just know of it, and have 
heard some praise).

[1] http://en.wikipedia.org/wiki/MIX_(Email)
--
mek at pels.in

(sorry about top posting, the mailclient on nokia n9 truly sucks.)On 2012-08-11 
09:35 Ciprian Dorin Craciun wrote:
Hello all!

My question -- rather a curiosity -- is if one could easily
implement an alternative message store instead of maildir. (I actually
have in mind a KV store like BerkeleyDB, or even a database like
CouchDB...) (I'm not also implying the same for the index, which I'm
aware is based on Xapian, which requires BerkeleyDB, which in turn
needs a local file system.)

After quickly looking over the code (2 minutes actually) I saw
that currently this is not easily possible without touching a lot of
files... Or am I wrong?

Better said: is such an abstract email store interface on the
to-do list, or even acceptable to have if someone provides it?

Thanks,
Ciprian.
___
notmuch mailing list
notmuch at notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch



Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread Vladimir Marek
Hi,

I have objections against maildir too, but I tried to tackle it from
different perspective. Store the maildir in zip file and use fuse-zip to
manage it. It works sort of but it has two major disadvantages:

 - fuse zip stores all changes in memory until unmounted
 - fuse zip (and libzip for that matter) creates new temporary file when
   updating archive, which takes considerable time when the archive is
   very big.

Looking at the zip file format, it could be made so that all
modifications would result only in appending new data to it (deleting is
writing new directory index not containing the deleted file).

I even made proof of concept libzip modification.

Of course this solution would have some disadvantages too, but for me
the advantages would win. At the moment I'm not sure if I want to
continue working on that. Maybe if there would be more interested guys
...

Cheers
-- 
Vlad


Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread Ciprian Dorin Craciun
Hello all!

My question -- rather a curiosity -- is if one could easily
implement an alternative message store instead of maildir. (I actually
have in mind a KV store like BerkeleyDB, or even a database like
CouchDB...) (I'm not also implying the same for the index, which I'm
aware is based on Xapian, which requires BerkeleyDB, which in turn
needs a local file system.)

After quickly looking over the code (2 minutes actually) I saw
that currently this is not easily possible without touching a lot of
files... Or am I wrong?

Better said: is such an abstract email store interface on the
to-do list, or even acceptable to have if someone provides it?

Thanks,
Ciprian.


Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread Ciprian Dorin Craciun
Hello all!

My question -- rather a curiosity -- is if one could easily
implement an alternative message store instead of maildir. (I actually
have in mind a KV store like BerkeleyDB, or even a database like
CouchDB...) (I'm not also implying the same for the index, which I'm
aware is based on Xapian, which requires BerkeleyDB, which in turn
needs a local file system.)

After quickly looking over the code (2 minutes actually) I saw
that currently this is not easily possible without touching a lot of
files... Or am I wrong?

Better said: is such an abstract email store interface on the
to-do list, or even acceptable to have if someone provides it?

Thanks,
Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread Vladimir Marek
Hi,

I have objections against maildir too, but I tried to tackle it from
different perspective. Store the maildir in zip file and use fuse-zip to
manage it. It works sort of but it has two major disadvantages:

 - fuse zip stores all changes in memory until unmounted
 - fuse zip (and libzip for that matter) creates new temporary file when
   updating archive, which takes considerable time when the archive is
   very big.

Looking at the zip file format, it could be made so that all
modifications would result only in appending new data to it (deleting is
writing new directory index not containing the deleted file).

I even made proof of concept libzip modification.

Of course this solution would have some disadvantages too, but for me
the advantages would win. At the moment I'm not sure if I want to
continue working on that. Maybe if there would be more interested guys
...

Cheers
-- 
Vlad
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread Ciprian Dorin Craciun
On Sat, Aug 11, 2012 at 12:46 PM, Vladimir Marek
vladimir.ma...@oracle.com wrote:
 Hi,

 I have objections against maildir too,

Just for the record I have nothing against maildir (or at least
when compared to mbox format). On the contrary I find it quite easy to
fiddle with...

My problem with it is that it doesn't scale... And I don't mean
this in a theoretical sense, I mean it in the concrete one: I have
about 661k emails... And a single `notmuch sync` takes a few tens of
seconds...

(Of course my problem could be partially solved by moving to a
fanout maildir folder, i.e. multiple maildirs. But this doesn't solve
the scalability it just delays the problem...)


 but I tried to tackle it from
 different perspective. Store the maildir in zip file and use fuse-zip to
 manage it. It works sort of but it has two major disadvantages:

I also thought of using either FUSE or 9p for this. Unfortunately
it doesn't quite solve my issue as seen above...


Now about other hacks to my problem:
* I'm aware that I can feed notmuch with individual file paths to
be indexed, but it still needs a path where to find an email;
* use the before mentioned fanout solution;
* others?

But regardless, having 600k emails on my disk (currently in the
same folder) is insane... Moreover I would have loved to be able to
use some Git plumbing as a store, or maybe CouchDB, etc...

Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread mek
How about implementing MIX[1] 

(and yes, i am totally ignorant about the format, i just know of it, and have 
heard some praise).

[1] http://en.wikipedia.org/wiki/MIX_(Email)
--
m...@pels.in

(sorry about top posting, the mailclient on nokia n9 truly sucks.)On 2012-08-11 
09:35 Ciprian Dorin Craciun wrote:
Hello all!

My question -- rather a curiosity -- is if one could easily
implement an alternative message store instead of maildir. (I actually
have in mind a KV store like BerkeleyDB, or even a database like
CouchDB...) (I'm not also implying the same for the index, which I'm
aware is based on Xapian, which requires BerkeleyDB, which in turn
needs a local file system.)

After quickly looking over the code (2 minutes actually) I saw
that currently this is not easily possible without touching a lot of
files... Or am I wrong?

Better said: is such an abstract email store interface on the
to-do list, or even acceptable to have if someone provides it?

Thanks,
Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread David Bremner
Ciprian Dorin Craciun ciprian.crac...@gmail.com writes:

 My question -- rather a curiosity -- is if one could easily
 implement an alternative message store instead of maildir. (I actuall
y
 have in mind a KV store like BerkeleyDB, or even a database like
 CouchDB...)

See 

id:1340657517-6539-6-git-send-email-et...@betacantrips.com

for one proposal. And yes, it touches quite a lot of code.

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread Jameson Graef Rollins
On Sat, Aug 11 2012, Ciprian Dorin Craciun ciprian.crac...@gmail.com wrote:
 My problem with it is that it doesn't scale... And I don't mean
 this in a theoretical sense, I mean it in the concrete one: I have
 about 661k emails... And a single `notmuch sync` takes a few tens of
 seconds...

Hey, Ciprian.  That sounds really slow, which makes me wonder if there
are other things going on here.  I have 155k messages, but notmuch new
takes a fraction of a second for me.  This initial indexing certainly
takes a long time (hours potentially), but additions after that should
be really fast.  What version of notmuch are you using?  What version of
xapian?

jamie.


pgpmVaF9OdY3G.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch