[notmuch] Git as notmuch object store (was: Potential problem using Git for mail)

2010-01-25 Thread Asheesh Laroia
On Mon, 25 Jan 2010, martin f krafft wrote:

> also sprach Asheesh Laroia  [2010.01.21.1928 
> +1300]:
>>> I suppose that I never actually considered merges on the IMAP server 
>>> side, but obviously the IMAP server has to work off a clone, and that 
>>> means it needs to merge.
>>
>> It's not "merge" that's unsafe; that just builds a tree in the git 
>> index (assuming no conflicts). It's the ensuing process of git writing 
>> a tree to the filesystem that is problematic.
>
> There is no way to make that atomic, I am afraid. As you say.
>
>> I could probably actually write a wrapper that locks the Maildir while 
>> git is operating. It would probably be specific to each IMAP server.
>
> Ouch! I'd really rather not go there.

You say "Ouch" but you should know Dovecot *already* does this. I don't 
mind interoperating with that.

See http://wiki.dovecot.org/MailboxFormat/Maildir, section "Issues with 
the specification", subsection "Locking". I term this the famous readdir() 
race. Without this lock, Maildir is fundamentally incompatible with IMAP 
-- one Maildir-using process modifying message flags could make a 
different Maildir-using process think said message is actually deleted. In 
the case of temporary disappearing mails in Mutt locally, that's not the 
end of the world. For IMAP, it will make the IMAP daemon (one of the 
Maildir-using processes) send a note to IMAP clients saying that the 
message has been deleted and expunged.

>> Note that this mean git is fundamentally incompatible with Maildir, not 
>> just IMAP servers.
>
> We had an idea about using Git to replace IMAP altogether, along with 
> making notmuch use a bare Git repository as object store. The idea is 
> that notmuch uses low-level Git commands to access the .git repository 
> (from which you can still checkout a tree tying the blobs into a 
> Maildir). The benefit would be compression, lower inode count (due to 
> packs), and backups using clones/merges.

Sure, that makes sense to me.

> You could either have the MDA write to a Git repo on the server side and 
> use git packs to download mail to a local clone, or one could have e.g. 
> offlineimap grow a Git storage backend. The interface to notmuch would 
> be the same.

Yeah, I generally like this.

> If we used this, all the rename and delete code would be refactored into 
> Git and could be removed from notmuch. In addition, notmuch could 
> actually use Git tree objects to represent the results of searches, and 
> you could checkout these trees. However, deleting messages from search 
> results would not have any effect on the message or its existence in 
> other search results, much like what happens with mairix nowadays.

That's okay with me.

> I think we all kinda agreed that the Maildir flags should not be used by 
> notmuch and that things like Sebastian's notmuchsync should be used if 
> people wanted flags represented in Maildir filenames.

Aww, I like Maildir flags, but if there's a sync tool, I'm fine with that.

> Instead of a Maildir checkout, notmuch could provide an interface to 
> browse the store contents in a way that could make it accessible to 
> mutt. The argument is that with 'notmuch {ls,cat,rm,?}', a mutt backend 
> could be trivially written. I am not sure about that, but it's worth a 
> try.

Sure.

> But there are still good reasons why you'd want to have IMAP capability 
> too, e.g. Webmail. Given the atomicity problems that come from Git, 
> maybe an IMAP server reading from the Git store would make sense.

It wouldn't be too hard to write a FUSE filesystem that presented an 
interface to a Git repository that didn't allow the contents of files to 
be modified. Then Dovecot could think it's interacting with the 
filesystem.

> However, this all sounds like a lot of NIH and reinvention. It's
> a bit like the marriage between the hypothetical Maildir2 and Git,
> which is definitely worth pursuing. Before we embark on any of this,
> however, we'd need to define the way in which Git stores mail.

Sure. If it were me, I'd just say, "For phase 1 of notmuch, just have git 
store Maildir spools." When you need a filesystem interface for e.g. 
Dovecot, have a FUSE wrapper.

See how far that can take you, and then see if version 2 is necessary. 
(-:

> Stewart, you've worked most on this so far. Would you like to share your 
> thoughts?

I'll listen, too.

Just don't fall into the trap of thinking Maildir is compatible with IMAP. 
It's not, because as I understand things, the filesystem doesn't guarantee 
that you can actually iterate across a directory's files if another 
process is modifying the list of files.

I'm not sure, but maybe it's safe if you refuse to ever modify a 
message's flags in the filena

[notmuch] Potential problem using Git for mail (was: Idea for storing tags)

2010-01-21 Thread Asheesh Laroia
On Fri, 15 Jan 2010, martin f krafft wrote:

> also sprach Asheesh Laroia  [2010.01.14.2112 +1300]:
>> Sure. But the MDA doesn't need to do the commit immediately. Since
>> (presumably) we're using Maildir, the MDA on the mail receiving
>> server is going to generate filenames that won't cause conflicts.
>> So it's okay to leave the files uncommitted.
>
> So when does the commit happen?
>
>> When I did the "git merge", git would create the Maildir files in
>> ~/Maildir/cur/... non-atomically.
>
> This might be something that the Git people could address if it was
> brought up on the mailing list. Then again, it might not be possible
> without going via a temporary file, which I doubt will fly.

A temporary file + rename() is the only way, as far as I know.

> I suppose that I never actually considered merges on the IMAP server 
> side, but obviously the IMAP server has to work off a clone, and that 
> means it needs to merge.

It's not "merge" that's unsafe; that just builds a tree in the git index 
(assuming no conflicts). It's the ensuing process of git writing a tree to 
the filesystem that is problematic.

I could probably actually write a wrapper that locks the Maildir while git 
is operating. It would probably be specific to each IMAP server.

Note that this mean git is fundamentally incompatible with Maildir, not 
just IMAP servers.

>> Dovecot would notice the file in ~/Maildir/cur/ and think, "This file 
>> must be ready!" So it would parse it even though git hadn't finished 
>> writing it. This caused me to only see partial headers in Alpine since 
>> Dovecot parsed it before it was a complete message.
>
> I wonder if a custom merge driver could address this to properly use 
> ?/tmp/ to assemble the message and only then move it.

I don't think a merge driver can do it for the reason stated above.

-- Asheesh.

-- 
I always turn to the sports pages first, which record people's accomplishments.
The front page has nothing but man's failures.
-- Chief Justice Earl Warren


Re: [notmuch] Potential problem using Git for mail (was: Idea for storing tags)

2010-01-20 Thread Asheesh Laroia

On Fri, 15 Jan 2010, martin f krafft wrote:


also sprach Asheesh Laroia ashe...@asheesh.org [2010.01.14.2112 +1300]:

Sure. But the MDA doesn't need to do the commit immediately. Since
(presumably) we're using Maildir, the MDA on the mail receiving
server is going to generate filenames that won't cause conflicts.
So it's okay to leave the files uncommitted.


So when does the commit happen?


When I did the git merge, git would create the Maildir files in
~/Maildir/cur/... non-atomically.


This might be something that the Git people could address if it was
brought up on the mailing list. Then again, it might not be possible
without going via a temporary file, which I doubt will fly.


A temporary file + rename() is the only way, as far as I know.

I suppose that I never actually considered merges on the IMAP server 
side, but obviously the IMAP server has to work off a clone, and that 
means it needs to merge.


It's not merge that's unsafe; that just builds a tree in the git index 
(assuming no conflicts). It's the ensuing process of git writing a tree to 
the filesystem that is problematic.


I could probably actually write a wrapper that locks the Maildir while git 
is operating. It would probably be specific to each IMAP server.


Note that this mean git is fundamentally incompatible with Maildir, not 
just IMAP servers.


Dovecot would notice the file in ~/Maildir/cur/ and think, This file 
must be ready! So it would parse it even though git hadn't finished 
writing it. This caused me to only see partial headers in Alpine since 
Dovecot parsed it before it was a complete message.


I wonder if a custom merge driver could address this to properly use 
…/tmp/ to assemble the message and only then move it.


I don't think a merge driver can do it for the reason stated above.

-- Asheesh.

--
I always turn to the sports pages first, which record people's accomplishments.
The front page has nothing but man's failures.
-- Chief Justice Earl Warren___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] Potential problem using Git for mail (was: Idea for storing tags)

2010-01-14 Thread Asheesh Laroia
On Tue, 12 Jan 2010, martin f krafft wrote:

> If the MDA delivers to Git, then potentially, you might get into a 
> situation where you cannot write your own changes back to the repo. This 
> is also a DoS scenario: I'll just keep sending you e-mail, and if I 
> manage to pass your mail filters, I'll basically commit to your mail 
> repository at regular intervals. Say those are 5 seconds. In order for 
> you to write updates to the repo, e.g. to update tags, then you would 
> need to pull, rebase, and push all within 5 seconds, for otherwise you'd 
> try to push non-fast-forwards.

Sure. But the MDA doesn't need to do the commit immediately. Since 
(presumably) we're using Maildir, the MDA on the mail receiving server is 
going to generate filenames that won't cause conflicts. So it's okay to 
leave the files uncommitted.

If that's too scary, then have the MDA deliver to its own git branch with 
its own checkout. Then, if you can force linearity with a lock (!), your 
client can have a special "lock the repo and push" command. Your remote 
MUA could even ask the MDA to lock the Maildir while it does a merge and 
then pushes that, and then the MDA can go back to dequeuing messages from 
the MTA into the Maildir.

Not the beautiful lockless world the purists want, but I'm okay with that.

> This a bit unrealistic, surely, but there's a real annoyance in it: 
> you'd have to pull/rebase/push until a push succeeds ? until you found a 
> time window between pull and push during which the MDA didn't write to 
> the repo. This might take a long time. If this happens in the background 
> by Cron, it's not a real concern, but if this becomes a UI issue, I 
> wouldn't know how to handle it.

It's not entirely unreasonable. Cron caused issues like that for me when I 
tracked my Maildir in git.

I'm just learning about notmuchmail.org, but I'll keep listening here. 
Preferably CC: me on replies to this mail.

I will say, I'm interested in an email setup with with working IMAP on at 
least one side.

There's one other bad race I ran into when using git to manage my 
Maildirs. I was using Dovecot to serve my Maildir to an IMAP client, 
alpine. I separately did a "git merge" from origin/master, where the 
remote MTA had an MDA deliving messages and a layer on top of that 
committed them.

When I did the "git merge", git would create the Maildir files in 
~/Maildir/cur/... non-atomically. Dovecot would notice the file in 
~/Maildir/cur/ and think, "This file must be ready!" So it would parse it 
even though git hadn't finished writing it. This caused me to only see 
partial headers in Alpine since Dovecot parsed it before it was a complete 
message.

That kind of sucked.

-- Asheesh.

-- 
Almost anything derogatory you could say about today's software design
would be accurate.
-- K. E. Iverson