RFC: notmuch powered (personal) (end-to-end) e-mail system

2011-03-20 Thread Ciprian Dorin Craciun
Hello all! (Sorry for the long email.)

I'm "struggling" for some time to get rid of the current
"de-facto" email solutions (i.e. GMail, Zimbra), and I've passively
observed for some time the notmuch project and community.

Although I've forwarded all my email to a single account, and I'm
currently mirroring my GMail account locally (by using `mbsync`),
index it by using notmuch, and I collect spam mails for later filter
training, unfortunately I'm unable to "convert" because the current
notmuch-powered solutions have (some of) the following shortcomings (I
don't want to offend anyone, so please take these as observations):
* the most feature full UI is the Emacs one -- thus limited remote
access (I mean from an arbitrary computer with only a web-browser);
(and I'm not a very big fan of Emacs;)
* most are still dependent on external IMAP systems -- this is not
a problem with notmuch itself, but for the integrating clients;
* SPAM -- as above -- is not integrated;
* filtering (tag applying) is not automatic (as in integrated in
notmuch itself or the client), but triggered through external scripts;

As such I'm thinking on implementing a custom end-to-end email
system and I would like to hear your feedback before embarking on such
a task.

I'm targeting the following features:
* (inbound) SMTP integration, thus once an email is received it is
automatically pushed through the system; (I'm primarily targeting
those users that afford to run their own SMTP server; but the solution
could still be adapted for those that only want the other features;)
* automatic spam filtering, and tag applying;
* automatic email triggers based on tags (such as user
notifications, forwarding, etc.)
* remote RPC-like access to the whole system;
* remote Web user interface;

About the overall architecture I'm thinking on adopting the following:
* in general the whole system is decomposed in independent
components (long-lived OS daemons) that each one does a particular job
(see below);
* all the components communicate between each-other through a
message queue system (for example ZeroMQ or RabbitMQ);
* all the communication is JSON based;

The components would be:
* SMTP inbound gateway -- for example I could take qmail or
Postfix and replace the delivery agent with a custom process that
pushes the email into the system; (any other solution suggestions?);
* email store -- as the name suggests it is a simple
key-value-like store that should persist raw email-messages; it should
be as robust as possible, and its contents should be the only thing
needed to reconstruct all the other derived data; (I could use here a
simple process that maintains a maildir, I could go also with a
BerkeleyDB wrapper, or even something more sophisticated;)
* spam filter -- which either classifies the email or trains the
spam filter; (for example I would use bogofilter;)
* email index -- this is where notmuch would come into play; it
would be fed with emails, which it would automatically apply tags and
issue trigger notifications based on tags; it also maintains a set of
filters and tags to automatically apply;
* (maybe) a coordinator that should delegate and monitor requests
to the above components; but if I'm using RabbitMQ and carefully
designing the above components, they could drive each other;
* restful web service that would intermediate access to all the
above components;

For now I have the following uncertainties:
* how should I handle multiple users? I think each user should
have it's own store / notmuch / bogofilter instance (at least in terms
of storage if not even in terms of separate daemon);
* should I keep the emails is a file-system, or a key-value store?
(the file-system is more bug-free, but I'm confident that a BerkeleyDB
instance would be more efficient);
* should I use libnotmuch or for starters just make a notmuch tool wrapper;
* and the most pressing one, transactions: I would like that at no
point does a message get half processed or lost; as such I need
notmuch to behave transactionally -- indexing the message and tagging
it should be atomic and durable; (is there a way with libnotmuch to
control the underlaying BerkeleyDB database?)

Suggestions? Considerations?

Ciprian.


RFC: notmuch powered (personal) (end-to-end) e-mail system

2011-03-20 Thread Austin Clements
Much of the beauty of notmuch is how few assumptions it makes about
your mail system.  It plays well with others.  For example, one deep
insight of notmuch is that it *doesn't* require a custom mail store,
even though a more obvious design might; in fact, it doesn't even
require Maildir.

That said, I think I can see where you're coming from and I also think
you're targeting some of the deficiencies of notmuch, but I also think
you're overengineering the solution.  As a result of notmuch's
simplicity, a fully working mail setup requires a lot of moving parts
besides notmuch and it can take a while for a new user to set all that
up, especially if they're migrating wholesale from some external mail
setup.

On Sun, Mar 20, 2011 at 10:07 AM, Ciprian Dorin Craciun
 wrote:
> ? ?As such I'm thinking on implementing a custom end-to-end email
> system and I would like to hear your feedback before embarking on such
> a task.
>
> ? ?I'm targeting the following features:
> ? ?* (inbound) SMTP integration, thus once an email is received it is
> automatically pushed through the system; (I'm primarily targeting
> those users that afford to run their own SMTP server; but the solution
> could still be adapted for those that only want the other features;)

As others have mentioned, see notmuch-deliver.  I and others have also
suggested inotify support for notmuch before, which would make the
inbound mail mechanism (be it SMTP, IMAP fetching, or whatever)
completely unaware of notmuch, offer some other benefits (for example,
if mail is manipulated outside notmuch via IMAP), and is highly
discoverable for new users (just have notmuch setup ask if they want
notmuch to monitor for new email and then fire up an inotify daemon
the first time notmuch is called).

> ? ?* automatic spam filtering, and tag applying;
> ? ?* automatic email triggers based on tags (such as user
> notifications, forwarding, etc.)

Obviously the above two can be scripted, but I agree that it's
unsatisfying that every user needs to roll their own delivery script.
While tagging and triggering are highly personal, they're not *so*
personal that everyone needs a completely custom solution.  This
should be more approachable.  I'm not sure what the best answer here
is, but I don't think it requires it requires integration with a
monolithic system to do right.

> ? ?* remote RPC-like access to the whole system;

This is another deep insight of notmuch.  It already has an awesome
RPC interface: the CLI.  Perhaps your actual problem is that the only
supported remote transport protocol is SSH.  This comes with a lot of
benefits (authentication, RPC pipelining), but also a lot of baggage
(a full SSH client on the client side).  I've thought about this in
the context of both an HTTP client and an Android client and in both
cases I concluded that a simple HTTPS transport wrapped around the
notmuch CLI would be the way to go.  Just put the CLI arguments in a
POST and send the JSON on stdout back.  This is trivial to prototype
as a Python CGI script, easy to build as a standalone Python server,
and not especially hard to build as a robust C server.

> ? ?* remote Web user interface;

A good web UI would be fantastic.  Based on the rest of your email, I
get the impression this was a requirements driver from much of the
above, especially the integrated tagging/triggering and RPC access.
I've already suggested a simple solution to the RPC problem.  For
tagging/triggering, it's probably worth developing a solution that
allows for machine-editable rules (ideally retaining
user-editableness), which would make it possible to integrate filter
management in to a web UI.  This could be as simple as a standard
delivery script that operates from some simple rule database.


RFC: notmuch powered (personal) (end-to-end) e-mail system

2011-03-20 Thread Ben Gamari
On Sun, 20 Mar 2011 16:07:50 +0200, Ciprian Dorin Craciun  wrote:
> Hello all! (Sorry for the long email.)
> 
> [snip]
> 
> * the most feature full UI is the Emacs one -- thus limited remote
> access (I mean from an arbitrary computer with only a web-browser);
> (and I'm not a very big fan of Emacs;)
> 
There have been a few attempts to put together an HTML front-end to
notmuch[1]. None have made it very far though. It would be nice to see
this space filled.

> * most are still dependent on external IMAP systems -- this is not
> a problem with notmuch itself, but for the integrating clients;
> 
Not entirely sure what you mean by this. You could easily use
e.g. notmuch-deliver as the local delivery agent with a SMTP server and
you'd have no need for IMAP.

> * SPAM -- as above -- is not integrated;
> 
Nor should it be. Mail indexing, viewing, composing, and
filtering are all orthogonal parts of a mail system. It takes all of
ten lines to invoke a spam filter in your filter script.

> * filtering (tag applying) is not automatic (as in integrated in
> notmuch itself or the client), but triggered through external scripts;
> 
Again, there is no reason why this should be incorporated into your
mail indexer.

> As such I'm thinking on implementing a custom end-to-end email
> system and I would like to hear your feedback before embarking on such
> a task.
> 
Notmuch works so well for its audience because it adheres to the UNIX
philosophy of "do one thing and do it well." The goal of an "integrated
end-to-end" mail system might sound nice, but IMHO it's a recipe for a
kludgey, unmaintainable nightmare which is mediocre at performing its
task, on a good day. Perhaps I'm misunderstanding your proposal but it
seems to me like you are taking an easy, already solved problem and
turning it into a difficult one.

> I'm targeting the following features:
> * (inbound) SMTP integration, thus once an email is received it is
> automatically pushed through the system; (I'm primarily targeting
> those users that afford to run their own SMTP server; but the solution
> could still be adapted for those that only want the other features;)
> 
Is there something wrong with Postfix with notmuch-deliver as a LDA?

> * automatic spam filtering, and tag applying;
>
A traditional sorting script with bogofilter/spamassassin?

> * automatic email triggers based on tags (such as user
> notifications, forwarding, etc.)
>
Again, a sorting script?

> * remote RPC-like access to the whole system;
> 
What's wrong with SSH?

> * remote Web user interface;
> 
Nothing fills this need currently. Feel free to write up something but
please don't couple it to some all-inclusive beheamoth of a project.

Personally, I would think more carefully about this project before
proceding. It sounds like you intend on reinventing various portions of
the wheel several times. Nothing you have listed is difficult to do with
a few scripts, notmuch, and an SMTP server. 

> About the overall architecture I'm thinking on adopting the following:
> * in general the whole system is decomposed in independent
> components (long-lived OS daemons) that each one does a particular job
> (see below);
> * all the components communicate between each-other through a
> message queue system (for example ZeroMQ or RabbitMQ);
> * all the communication is JSON based;
> 
> The components would be:
> * SMTP inbound gateway -- for example I could take qmail or
> Postfix and replace the delivery agent with a custom process that
> pushes the email into the system; (any other solution suggestions?);
> * email store -- as the name suggests it is a simple
> key-value-like store that should persist raw email-messages; it should
> be as robust as possible, and its contents should be the only thing
> needed to reconstruct all the other derived data; (I could use here a
> simple process that maintains a maildir, I could go also with a
> BerkeleyDB wrapper, or even something more sophisticated;)
> * spam filter -- which either classifies the email or trains the
> spam filter; (for example I would use bogofilter;)
> * email index -- this is where notmuch would come into play; it
> would be fed with emails, which it would automatically apply tags and
> issue trigger notifications based on tags; it also maintains a set of
> filters and tags to automatically apply;
> * (maybe) a coordinator that should delegate and monitor requests
> to the above components; but if I'm using RabbitMQ and carefully
> designing the above components, they could drive each other;
> * restful web service that would intermediate access to all the
> above components;
> 
> For now I have the following uncertainties:
> * how should I handle multiple users? I think each user should
> have it's own store / notmuch / bogofilter instance (at least in terms
> of storage if not even in terms of separate daemon);
> * should I keep the 

RFC: notmuch powered (personal) (end-to-end) e-mail system

2011-03-20 Thread Brett Viren
On Sun, Mar 20, 2011 at 10:07 AM, Ciprian Dorin Craciun
 wrote:

> ? ?I'm "struggling" for some time to get rid of the current
> "de-facto" email solutions (i.e. GMail, Zimbra), and I've passively
> observed for some time the notmuch project and community.

It sounds like what you want *is* GMail (I don't know Zimbra) but just
that you want it running on your own box instead of on Google's
servers.

> ? ?Suggestions? Considerations?

Based on what you wrote, I think BerkeleyDB will be too limiting.
I suggest for you to look into DBMail[1] for the mail store.


-Brett.

[1] http://www.dbmail.org/


RFC: notmuch powered (personal) (end-to-end) e-mail system

2011-03-20 Thread Ciprian Dorin Craciun
Hello all! (Sorry for the long email.)

I'm struggling for some time to get rid of the current
de-facto email solutions (i.e. GMail, Zimbra), and I've passively
observed for some time the notmuch project and community.

Although I've forwarded all my email to a single account, and I'm
currently mirroring my GMail account locally (by using `mbsync`),
index it by using notmuch, and I collect spam mails for later filter
training, unfortunately I'm unable to convert because the current
notmuch-powered solutions have (some of) the following shortcomings (I
don't want to offend anyone, so please take these as observations):
* the most feature full UI is the Emacs one -- thus limited remote
access (I mean from an arbitrary computer with only a web-browser);
(and I'm not a very big fan of Emacs;)
* most are still dependent on external IMAP systems -- this is not
a problem with notmuch itself, but for the integrating clients;
* SPAM -- as above -- is not integrated;
* filtering (tag applying) is not automatic (as in integrated in
notmuch itself or the client), but triggered through external scripts;

As such I'm thinking on implementing a custom end-to-end email
system and I would like to hear your feedback before embarking on such
a task.

I'm targeting the following features:
* (inbound) SMTP integration, thus once an email is received it is
automatically pushed through the system; (I'm primarily targeting
those users that afford to run their own SMTP server; but the solution
could still be adapted for those that only want the other features;)
* automatic spam filtering, and tag applying;
* automatic email triggers based on tags (such as user
notifications, forwarding, etc.)
* remote RPC-like access to the whole system;
* remote Web user interface;

About the overall architecture I'm thinking on adopting the following:
* in general the whole system is decomposed in independent
components (long-lived OS daemons) that each one does a particular job
(see below);
* all the components communicate between each-other through a
message queue system (for example ZeroMQ or RabbitMQ);
* all the communication is JSON based;

The components would be:
* SMTP inbound gateway -- for example I could take qmail or
Postfix and replace the delivery agent with a custom process that
pushes the email into the system; (any other solution suggestions?);
* email store -- as the name suggests it is a simple
key-value-like store that should persist raw email-messages; it should
be as robust as possible, and its contents should be the only thing
needed to reconstruct all the other derived data; (I could use here a
simple process that maintains a maildir, I could go also with a
BerkeleyDB wrapper, or even something more sophisticated;)
* spam filter -- which either classifies the email or trains the
spam filter; (for example I would use bogofilter;)
* email index -- this is where notmuch would come into play; it
would be fed with emails, which it would automatically apply tags and
issue trigger notifications based on tags; it also maintains a set of
filters and tags to automatically apply;
* (maybe) a coordinator that should delegate and monitor requests
to the above components; but if I'm using RabbitMQ and carefully
designing the above components, they could drive each other;
* restful web service that would intermediate access to all the
above components;

For now I have the following uncertainties:
* how should I handle multiple users? I think each user should
have it's own store / notmuch / bogofilter instance (at least in terms
of storage if not even in terms of separate daemon);
* should I keep the emails is a file-system, or a key-value store?
(the file-system is more bug-free, but I'm confident that a BerkeleyDB
instance would be more efficient);
* should I use libnotmuch or for starters just make a notmuch tool wrapper;
* and the most pressing one, transactions: I would like that at no
point does a message get half processed or lost; as such I need
notmuch to behave transactionally -- indexing the message and tagging
it should be atomic and durable; (is there a way with libnotmuch to
control the underlaying BerkeleyDB database?)

Suggestions? Considerations?

Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: RFC: notmuch powered (personal) (end-to-end) e-mail system

2011-03-20 Thread Brett Viren
On Sun, Mar 20, 2011 at 10:07 AM, Ciprian Dorin Craciun
ciprian.crac...@gmail.com wrote:

    I'm struggling for some time to get rid of the current
 de-facto email solutions (i.e. GMail, Zimbra), and I've passively
 observed for some time the notmuch project and community.

It sounds like what you want *is* GMail (I don't know Zimbra) but just
that you want it running on your own box instead of on Google's
servers.

    Suggestions? Considerations?

Based on what you wrote, I think BerkeleyDB will be too limiting.
I suggest for you to look into DBMail[1] for the mail store.


-Brett.

[1] http://www.dbmail.org/
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: RFC: notmuch powered (personal) (end-to-end) e-mail system

2011-03-20 Thread Ben Gamari
On Sun, 20 Mar 2011 16:07:50 +0200, Ciprian Dorin Craciun 
ciprian.crac...@gmail.com wrote:
 Hello all! (Sorry for the long email.)
 
 [snip]
 
 * the most feature full UI is the Emacs one -- thus limited remote
 access (I mean from an arbitrary computer with only a web-browser);
 (and I'm not a very big fan of Emacs;)
 
There have been a few attempts to put together an HTML front-end to
notmuch[1]. None have made it very far though. It would be nice to see
this space filled.

 * most are still dependent on external IMAP systems -- this is not
 a problem with notmuch itself, but for the integrating clients;
 
Not entirely sure what you mean by this. You could easily use
e.g. notmuch-deliver as the local delivery agent with a SMTP server and
you'd have no need for IMAP.

 * SPAM -- as above -- is not integrated;
 
Nor should it be. Mail indexing, viewing, composing, and
filtering are all orthogonal parts of a mail system. It takes all of
ten lines to invoke a spam filter in your filter script.

 * filtering (tag applying) is not automatic (as in integrated in
 notmuch itself or the client), but triggered through external scripts;
 
Again, there is no reason why this should be incorporated into your
mail indexer.

 As such I'm thinking on implementing a custom end-to-end email
 system and I would like to hear your feedback before embarking on such
 a task.
 
Notmuch works so well for its audience because it adheres to the UNIX
philosophy of do one thing and do it well. The goal of an integrated
end-to-end mail system might sound nice, but IMHO it's a recipe for a
kludgey, unmaintainable nightmare which is mediocre at performing its
task, on a good day. Perhaps I'm misunderstanding your proposal but it
seems to me like you are taking an easy, already solved problem and
turning it into a difficult one.

 I'm targeting the following features:
 * (inbound) SMTP integration, thus once an email is received it is
 automatically pushed through the system; (I'm primarily targeting
 those users that afford to run their own SMTP server; but the solution
 could still be adapted for those that only want the other features;)
 
Is there something wrong with Postfix with notmuch-deliver as a LDA?

 * automatic spam filtering, and tag applying;

A traditional sorting script with bogofilter/spamassassin?

 * automatic email triggers based on tags (such as user
 notifications, forwarding, etc.)

Again, a sorting script?

 * remote RPC-like access to the whole system;
 
What's wrong with SSH?

 * remote Web user interface;
 
Nothing fills this need currently. Feel free to write up something but
please don't couple it to some all-inclusive beheamoth of a project.

Personally, I would think more carefully about this project before
proceding. It sounds like you intend on reinventing various portions of
the wheel several times. Nothing you have listed is difficult to do with
a few scripts, notmuch, and an SMTP server. 

 About the overall architecture I'm thinking on adopting the following:
 * in general the whole system is decomposed in independent
 components (long-lived OS daemons) that each one does a particular job
 (see below);
 * all the components communicate between each-other through a
 message queue system (for example ZeroMQ or RabbitMQ);
 * all the communication is JSON based;
 
 The components would be:
 * SMTP inbound gateway -- for example I could take qmail or
 Postfix and replace the delivery agent with a custom process that
 pushes the email into the system; (any other solution suggestions?);
 * email store -- as the name suggests it is a simple
 key-value-like store that should persist raw email-messages; it should
 be as robust as possible, and its contents should be the only thing
 needed to reconstruct all the other derived data; (I could use here a
 simple process that maintains a maildir, I could go also with a
 BerkeleyDB wrapper, or even something more sophisticated;)
 * spam filter -- which either classifies the email or trains the
 spam filter; (for example I would use bogofilter;)
 * email index -- this is where notmuch would come into play; it
 would be fed with emails, which it would automatically apply tags and
 issue trigger notifications based on tags; it also maintains a set of
 filters and tags to automatically apply;
 * (maybe) a coordinator that should delegate and monitor requests
 to the above components; but if I'm using RabbitMQ and carefully
 designing the above components, they could drive each other;
 * restful web service that would intermediate access to all the
 above components;
 
 For now I have the following uncertainties:
 * how should I handle multiple users? I think each user should
 have it's own store / notmuch / bogofilter instance (at least in terms
 of storage if not even in terms of separate daemon);
 * should I keep the emails is a file-system, or a key-value store?
 

Re: RFC: notmuch powered (personal) (end-to-end) e-mail system

2011-03-20 Thread Austin Clements
Much of the beauty of notmuch is how few assumptions it makes about
your mail system.  It plays well with others.  For example, one deep
insight of notmuch is that it *doesn't* require a custom mail store,
even though a more obvious design might; in fact, it doesn't even
require Maildir.

That said, I think I can see where you're coming from and I also think
you're targeting some of the deficiencies of notmuch, but I also think
you're overengineering the solution.  As a result of notmuch's
simplicity, a fully working mail setup requires a lot of moving parts
besides notmuch and it can take a while for a new user to set all that
up, especially if they're migrating wholesale from some external mail
setup.

On Sun, Mar 20, 2011 at 10:07 AM, Ciprian Dorin Craciun
ciprian.crac...@gmail.com wrote:
    As such I'm thinking on implementing a custom end-to-end email
 system and I would like to hear your feedback before embarking on such
 a task.

    I'm targeting the following features:
    * (inbound) SMTP integration, thus once an email is received it is
 automatically pushed through the system; (I'm primarily targeting
 those users that afford to run their own SMTP server; but the solution
 could still be adapted for those that only want the other features;)

As others have mentioned, see notmuch-deliver.  I and others have also
suggested inotify support for notmuch before, which would make the
inbound mail mechanism (be it SMTP, IMAP fetching, or whatever)
completely unaware of notmuch, offer some other benefits (for example,
if mail is manipulated outside notmuch via IMAP), and is highly
discoverable for new users (just have notmuch setup ask if they want
notmuch to monitor for new email and then fire up an inotify daemon
the first time notmuch is called).

    * automatic spam filtering, and tag applying;
    * automatic email triggers based on tags (such as user
 notifications, forwarding, etc.)

Obviously the above two can be scripted, but I agree that it's
unsatisfying that every user needs to roll their own delivery script.
While tagging and triggering are highly personal, they're not *so*
personal that everyone needs a completely custom solution.  This
should be more approachable.  I'm not sure what the best answer here
is, but I don't think it requires it requires integration with a
monolithic system to do right.

    * remote RPC-like access to the whole system;

This is another deep insight of notmuch.  It already has an awesome
RPC interface: the CLI.  Perhaps your actual problem is that the only
supported remote transport protocol is SSH.  This comes with a lot of
benefits (authentication, RPC pipelining), but also a lot of baggage
(a full SSH client on the client side).  I've thought about this in
the context of both an HTTP client and an Android client and in both
cases I concluded that a simple HTTPS transport wrapped around the
notmuch CLI would be the way to go.  Just put the CLI arguments in a
POST and send the JSON on stdout back.  This is trivial to prototype
as a Python CGI script, easy to build as a standalone Python server,
and not especially hard to build as a robust C server.

    * remote Web user interface;

A good web UI would be fantastic.  Based on the rest of your email, I
get the impression this was a requirements driver from much of the
above, especially the integrated tagging/triggering and RPC access.
I've already suggested a simple solution to the RPC problem.  For
tagging/triggering, it's probably worth developing a solution that
allows for machine-editable rules (ideally retaining
user-editableness), which would make it possible to integrate filter
management in to a web UI.  This could be as simple as a standard
delivery script that operates from some simple rule database.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch