[notmuch] Idea for storing tags

2010-01-15 Thread martin f krafft
also sprach Carl Worth  [2010.01.15.1124 +1300]:
> > You might have marked a message 'read' on one machine and if the two
> > get out of sync on another machine, you might have the same message
> > unread there.
> 
> That's a different issue though. With two databases there's clearly the
> opportunity for the two databases to be out of synch.
> 
> But you talked about the database being out of synch with respect to the
> mailstore. And that's something I just don't understand, (given the
> assumption that all tags are stored in the database---which was the
> explicit description of the case of interest).

Yes, we are talking about the situation where the tagstore is
seperate from the mailstore, and that they are both synchronised
with a server, or between machines, separately. If for some reason
you only synchronise the mailstore ? say because the connection
drops before the sync of the tagstore completes ? then you end up
with an out-of-sync situation, because the mailstore-sync will have
pulled in a new message, but not the associated tags. So if you had
already read this message on another machine and tagged it 'done',
then it would show up on this machine as 'new' without the 'done'
tag, because the tags were not synchronised.

The only way to really solve this is by transferring a message and
its tags in a transactional way.

> > Shouldn't this just be solved? I've had formail+procmail delete my
> > duplicates for 10+ years, and while I don't like the fact that
> > I usually get the CC before the list mail, and thus cannot filter on
> > Delivered-To, I have never looked back.
> 
> Notmuch has access to all the information it needs to allow you to
> delete the CC version once the list mail arrives. So you could do
> notmuch-based deletion now and avoid losing the Delivered-To header if
> you want.

Of course. I hadn't thought that far.

However, there are still benefits to formail, namely avoiding having
to run duplicates through potentially expensive spamfilters.

> I think that synchronizing the mail store and synchronizing the
> tags information are tasks that have different requirements, and
> for which we may well want different tools.

Fair enough. Maybe I am just paranoid about the stores getting out
of sync (see above).

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

"we all know linux is great...
 it does infinite loops in 5 seconds."
 -- linus torvalds

spamtraps: madduck.bogus at madduck.net
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature (see http://martin-krafft.net/gpg/)
URL: 



[notmuch] Idea for storing tags

2010-01-14 Thread martin f krafft
also sprach Carl Worth  [2010.01.14.1432 +1300]:
> Yes. This approach requires some external means of synchronizing the
> tags from one system to another.
> 
> I don't understand what it would mean to have the mailstore and the
> database out of synch here. This approach doesn't have the tags in the
> mailstore by definition, right?

You might have marked a message 'read' on one machine and if the two
get out of sync on another machine, you might have the same message
unread there.

> > How about using pseudo-mails stored in Maildir and synchronised by
> > IMAP? E.g. every folder could have a subfolder .TAGS and if we find
> > a way to smartly pair messages between parent and subfolder, we'd
> > have a tag store alongside the mailstore it refers to, but without
> > the danger of leakage, and without having to rewrite messages.
> ...
> > Anyway, the idea is out now. Thoughts?
> 
> There are a couple of problems that I don't see addressed at all with
> this approach. The first is that there's not a one-to-one mapping
> between messages and files in the mail store. (I'm CCed on a lot of list
> mail meaning that I have multiple files in my mail store for a single
> message.)

Shouldn't this just be solved? I've had formail+procmail delete my
duplicates for 10+ years, and while I don't like the fact that
I usually get the CC before the list mail, and thus cannot filter on
Delivered-To, I have never looked back.

> Second, the only reason I would be interested in synchronizing mail
> between two systems is so that I could manipulate the tag data in
> multiple places, (that is, remove the "unread" tag whether on my
> network-disconnected laptop or via web-mail when away from my
> laptop). Using imap for synchronizing a file of tags within the mail
> store gives you no mechanism for doing any sort of conflict resolution,
> right? (Which I think in almost all cases is going to be quite trivial
> if there's a chance for a program to resolve it.)

I have not thought about this, but you are right. IMAP does not
really allow for conflict resolution, which may well be *the* reason
why you cannot update existing messages.

> [*] Though, I think a plain-text file with tags managed with
> something like git (and perhaps a custom merger) could save a lot
> of work. Or perhaps a plain-text journal of tag manipulations on
> either end that could be replayed on the other.

Git is good at conflict resolution if run interactively, but [0]
still makes me question whether it can ever take the place of IMAP.
However, Asheesh Laroia, who has floated the idea of Git-for-mail at
DebConf8 already, has some ideas and hopefully will soon reply to my
mail [0], which I just bounced.

0. http://notmuchmail.org/pipermail/notmuch/2010/001114.html

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

apt-get source --compile gentoo

spamtraps: madduck.bogus at madduck.net
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature (see http://martin-krafft.net/gpg/)
URL: 



[notmuch] Idea for storing tags

2010-01-14 Thread Carl Worth
On Thu, 14 Jan 2010 21:04:21 +1300, martin f krafft  
wrote:
> You might have marked a message 'read' on one machine and if the two
> get out of sync on another machine, you might have the same message
> unread there.

That's a different issue though. With two databases there's clearly the
opportunity for the two databases to be out of synch.

But you talked about the database being out of synch with respect to the
mailstore. And that's something I just don't understand, (given the
assumption that all tags are stored in the database---which was the
explicit description of the case of interest).

> Shouldn't this just be solved? I've had formail+procmail delete my
> duplicates for 10+ years, and while I don't like the fact that
> I usually get the CC before the list mail, and thus cannot filter on
> Delivered-To, I have never looked back.

Notmuch has access to all the information it needs to allow you to
delete the CC version once the list mail arrives. So you could do
notmuch-based deletion now and avoid losing the Delivered-To header if
you want.

> > [*] Though, I think a plain-text file with tags managed with
> > something like git (and perhaps a custom merger) could save a lot
> > of work. Or perhaps a plain-text journal of tag manipulations on
> > either end that could be replayed on the other.
> 
> Git is good at conflict resolution if run interactively, but [0]
> still makes me question whether it can ever take the place of IMAP.
> However, Asheesh Laroia, who has floated the idea of Git-for-mail at
> DebConf8 already, has some ideas and hopefully will soon reply to my
> mail [0], which I just bounced.
> 
> 0. http://notmuchmail.org/pipermail/notmuch/2010/001114.html

Using git for mail is an interesting idea, but not what I was actually
proposing here.

I think that synchronizing the mail store and synchronizing the tags
information are tasks that have different requirements, and for which we
may well want different tools.

So I was talking about using imap (or rsync, or what have you) for
copying the mailtstore, and then having something with a bit more
domain-specific awareness for doing the synchronization of the tags
data.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



Re: [notmuch] Idea for storing tags

2010-01-14 Thread Carl Worth
On Thu, 14 Jan 2010 21:04:21 +1300, martin f krafft madd...@madduck.net wrote:
 You might have marked a message 'read' on one machine and if the two
 get out of sync on another machine, you might have the same message
 unread there.

That's a different issue though. With two databases there's clearly the
opportunity for the two databases to be out of synch.

But you talked about the database being out of synch with respect to the
mailstore. And that's something I just don't understand, (given the
assumption that all tags are stored in the database---which was the
explicit description of the case of interest).

 Shouldn't this just be solved? I've had formail+procmail delete my
 duplicates for 10+ years, and while I don't like the fact that
 I usually get the CC before the list mail, and thus cannot filter on
 Delivered-To, I have never looked back.

Notmuch has access to all the information it needs to allow you to
delete the CC version once the list mail arrives. So you could do
notmuch-based deletion now and avoid losing the Delivered-To header if
you want.

  [*] Though, I think a plain-text file with tags managed with
  something like git (and perhaps a custom merger) could save a lot
  of work. Or perhaps a plain-text journal of tag manipulations on
  either end that could be replayed on the other.
 
 Git is good at conflict resolution if run interactively, but [0]
 still makes me question whether it can ever take the place of IMAP.
 However, Asheesh Laroia, who has floated the idea of Git-for-mail at
 DebConf8 already, has some ideas and hopefully will soon reply to my
 mail [0], which I just bounced.
 
 0. http://notmuchmail.org/pipermail/notmuch/2010/001114.html

Using git for mail is an interesting idea, but not what I was actually
proposing here.

I think that synchronizing the mail store and synchronizing the tags
information are tasks that have different requirements, and for which we
may well want different tools.

So I was talking about using imap (or rsync, or what have you) for
copying the mailtstore, and then having something with a bit more
domain-specific awareness for doing the synchronization of the tags
data.

-Carl


pgpiO4aGHApgV.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] Idea for storing tags

2010-01-14 Thread martin f krafft
also sprach Carl Worth cwo...@cworth.org [2010.01.15.1124 +1300]:
  You might have marked a message 'read' on one machine and if the two
  get out of sync on another machine, you might have the same message
  unread there.
 
 That's a different issue though. With two databases there's clearly the
 opportunity for the two databases to be out of synch.
 
 But you talked about the database being out of synch with respect to the
 mailstore. And that's something I just don't understand, (given the
 assumption that all tags are stored in the database---which was the
 explicit description of the case of interest).

Yes, we are talking about the situation where the tagstore is
seperate from the mailstore, and that they are both synchronised
with a server, or between machines, separately. If for some reason
you only synchronise the mailstore — say because the connection
drops before the sync of the tagstore completes — then you end up
with an out-of-sync situation, because the mailstore-sync will have
pulled in a new message, but not the associated tags. So if you had
already read this message on another machine and tagged it 'done',
then it would show up on this machine as 'new' without the 'done'
tag, because the tags were not synchronised.

The only way to really solve this is by transferring a message and
its tags in a transactional way.

  Shouldn't this just be solved? I've had formail+procmail delete my
  duplicates for 10+ years, and while I don't like the fact that
  I usually get the CC before the list mail, and thus cannot filter on
  Delivered-To, I have never looked back.
 
 Notmuch has access to all the information it needs to allow you to
 delete the CC version once the list mail arrives. So you could do
 notmuch-based deletion now and avoid losing the Delivered-To header if
 you want.

Of course. I hadn't thought that far.

However, there are still benefits to formail, namely avoiding having
to run duplicates through potentially expensive spamfilters.

 I think that synchronizing the mail store and synchronizing the
 tags information are tasks that have different requirements, and
 for which we may well want different tools.

Fair enough. Maybe I am just paranoid about the stores getting out
of sync (see above).

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
we all know linux is great...
 it does infinite loops in 5 seconds.
 -- linus torvalds
 
spamtraps: madduck.bo...@madduck.net


digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] Idea for storing tags

2010-01-13 Thread martin f krafft
also sprach Scott Morrison  [2010.01.13.1752 +1300]:
> The problem with anything that is not universally supported is
> that for a package that is to appeal to a wide userbase, most
> don't know and don't care about the particulars of this IMAP
> server vs that IMAP server.  all they know it that for some reason
> it doesn't work with account X -- which leads to support head
> aches.
[...]
> Call it Googles problem as you like -- but when I have a product
> that doesn't work with GMAIL IMAP there are a lot of potential
> users that don't care about server peculiarities and rather just
> have it work.

Well, the way I see it: you cannot change all IMAP servers at once,
and you certainly cannot change Google. If it's possible to
implement tagging for email (dare say semantic e-mail) with standard
means (where standard means sub-standard, as exemplified by your
previous GMail IMAP example), then that's the best way, but if that
can't happen then we ought to try a better way. Should we find
a solution then, by the rate of standardisation on the 'Net, maybe
my grandchildren will finally be able to do proper e-mail. ;)

> I agree that conceptually duplicates should be buried but end
> users do have "peculiar" organization systems.

I think tags should help abstract e-mail away from underlying
storage and I'd love that to be a goal.

> From my reading, uidplus doesn't allow a delta modification of
> a message on a server -- just to write a portion of a message back
> -- you still have to write the whole thing back and that can mean
> real bandwidth issues for some messages.

Absolutely. It would indeed be better if you could just send
changes.

I just sent a blank mail to
imap-protocol-subscribe at mailman.u.washington.edu
and have started browsing the archives. So far, there's not really
anything relevant.

Anyway, looking back at the RFC on keywords, it's not exactly
encouraging:

  A keyword is defined by the server implementation. Keywords do not
  begin with "\". Servers MAY permit the client to define new
  keywords in the mailbox (see the description of the PERMANENTFLAGS
  response code for more information).

Anyway, I'll try to untangle the various issues re:IMAP we've been
seeing, write mails for each, and hopefully get to the point where
I can enquire about IMAPv5. ;)

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

the unix philosophy basically involves
giving you enough rope to hang yourself.
and then some more, just to be sure.

spamtraps: madduck.bogus at madduck.net
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature (see http://martin-krafft.net/gpg/)
URL: 



[notmuch] Idea for storing tags

2010-01-13 Thread Carl Worth
On Wed, 13 Jan 2010 00:39:14 -0500, Scott Morrison  wrote:
> > Maybe a better approach would be content addressing (see below).
> 
> Content hashing -- good Idea (& not something that has hit me before)
> -- better than Message-Id as I believe there are still some MUA /MTAs
> that allow messages without message ids.  The only potential issue
> with this is that it is critical then to preserve the message source
> against encoding changes though that shouldn't be too hard to avoid.

Another problem with content-based naming for messages is that most of
the messages in my mail store that I consider duplicates don't actually
have identical content. (One is sent directly to me via CC and the other
is sent by the mailing-list software *after* appending a footer to the
message.)

That said, notmuch already does use a sha-1 sum as the message
identifier for any message that does not have a valid Message-ID
header. So there's definitely a place for this.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[notmuch] Idea for storing tags

2010-01-13 Thread Carl Worth
On Tue, 12 Jan 2010 11:19:09 +1300, martin f krafft  
wrote:
> 1. External database, which has the downside of not being
>synchronisable with standard IMAP, like the rest of your mail
>(assuming you use IMAP). Also, it's possible for mailstore and
>database to get out of sync.

Yes. This approach requires some external means of synchronizing the
tags from one system to another.

I don't understand what it would mean to have the mailstore and the
database out of synch here. This approach doesn't have the tags in the
mailstore by definition, right?

> How about using pseudo-mails stored in Maildir and synchronised by
> IMAP? E.g. every folder could have a subfolder .TAGS and if we find
> a way to smartly pair messages between parent and subfolder, we'd
> have a tag store alongside the mailstore it refers to, but without
> the danger of leakage, and without having to rewrite messages.
...
> Anyway, the idea is out now. Thoughts?

There are a couple of problems that I don't see addressed at all with
this approach. The first is that there's not a one-to-one mapping
between messages and files in the mail store. (I'm CCed on a lot of list
mail meaning that I have multiple files in my mail store for a single
message.)

Second, the only reason I would be interested in synchronizing mail
between two systems is so that I could manipulate the tag data in
multiple places, (that is, remove the "unread" tag whether on my
network-disconnected laptop or via web-mail when away from my
laptop). Using imap for synchronizing a file of tags within the mail
store gives you no mechanism for doing any sort of conflict resolution,
right? (Which I think in almost all cases is going to be quite trivial
if there's a chance for a program to resolve it.)

So it sounds to me like we're going to need *something* custom for doing
the synchronization, (to handle modifications on both ends). At which
point there's only disadvantages to keeping the data inside the
mailstore, and there's also no disadvantage left to keeping the data
inside a database. [*]

[*] Though, I think a plain-text file with tags managed with something
like git (and perhaps a custom merger) could save a lot of work. Or
perhaps a plain-text journal of tag manipulations on either end that
could be replayed on the other.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[notmuch] Idea for storing tags

2010-01-13 Thread martin f krafft
also sprach Scott Morrison  [2010.01.12.1711 +1300]:
> 1.  synchronization of tag data with emails -- if they are in
> a subfolder then it presents the issue of maintaining this
> subfolder when managing emails (moving, deleting, duplicating etc)
> and any .tag folder unaware clients are likely cause an breakage
> in tagdata/message association.  One way of doing this is to have
> a global .tag folder.

A global .tag folder indexed by e.g. message ID, as you state later,
would probably allow for this. Or a file-per-tag design. We'd have
to think carefully about pros and cons for each.

When thinking about this, I always have to remind myself that we are
targetting this at a design that has indexed search. If that weren't
the case, searches would be incredibly expensive.

Maybe a better approach would be content addressing (see below).

> 2. what happens if that message is archived or moved to an
> exclusively local cache -- eg. Mail.app on OS X can easily move
> IMAP messages to a folder resident on the computers computers?

Well, if the target can store tags, then ideally the MUA should know
how to transfer them along.

Maybe the right thing to do would be to use extended attributes
(which are stored in the inode!), even if they may not be
universally supported yet. If our solution scales, then this might
lead to a significant increase in xattr adoption.

> 3. what happens with duplicates of emails -- I would assume that
> the message id would be the key to match the tag data to the
> message.  In this system a duplicate of a message could not have
> a different set of tags from the original (not that this would
> necessarily be desirable.)

Duplicates need folders, and tags and folders are somewhat at odds
with each other. I mean, you can represent a folder hierarchy with
tags (and more), and if you have tags and folders, you are
potentially introducing a level of confusion/ambiguity that we don't
want in the first place. Maybe the ideal solution doesn't need
folders anymore (and IMAP-compatible (Maildir) subfolders have
always been a hack anyway).

There are also two types of duplicates: copies and links. The former
can diverge, the latter can't. I don't really see a reason for
either. It's not like you need to copy a mail before you edit it,
and I don't see a real reason for linking, assuming that the primary
means of browsing will be tag-searches anyway.

Duplicates always make me think of content addressing, like Git's
object cache. We could store the content hash of a message in its
filename, and also use the hash to index into the tag database.
I think that would be much cleaner than message IDs, and would make
handling true duplicates (links) much easier, while copies (diverged
ex-duplicates) would also be taken care of automatically.

> Your mention of potential leakage (aka inadvertent disclosure of
> tag data) is real -- but only if the client used to bounce/forward
> is not the one to tag the message (one would assume that if
> a client can tag, it can know to exclude the tags in a bounce.)

True, and it's probably the minority of people using multiple
clients. But those who do might also manipulate mail with sed and
use sendmail directly.

I don't think we can successfully enhance RFC 5351 to make MTAs
always ditch the Tags:-header.

> Mail.app -- which I am pluging into does not forward headers --

ew! ;) (I think one should be able to forward pristine mails)

> though it will include all headers in a bounce -- but chance are
> you aren't tagging messages you are bouncing.:)

That chance might well be very low. I bounce/forward-as-attachment
a lot of mail from the past to make it easier for others to
establish context.

> The performance issue is very real -- because it means that
> somehow messages have to rewritten to the IMAP server -- IMAP
> doesn't have a mechanism AFAIK for updates.

Not even UIDPLUS?
http://wiki.dovecot.org/FeatUIDPLUS

> Additionally, IMAP doesn't have a mechanism for simply replacing
> one message data with another -- a new message must be written and
> the old message must be deleted and the message IMAP UID will
> change, and the client will have to deal with this especially if
> it is cache the messages.

Yes, I am experiencing this pain regularly, since I currently use
a lot of message rewriting as part of my workflow ? one of the
reasons why I'd like to find an alternative.

> Also GMAIL IMAP is an issue-

Yeah, I bet. Is there anyone who doesn't think that that's Google's
problem, not ours, though?

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

"there's someone in my head but it's not me."
-- pink floyd, the dark side of the moon, 1972

spamtraps: madduck.bogus at madduck.net
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature (see http://martin-krafft.net/gpg/)
URL: 

[notmuch] Idea for storing tags

2010-01-13 Thread Scott Morrison

On 2010-01-12, at 8:24 PM, martin f krafft wrote:

> also sprach Scott Morrison  [2010.01.12.1711 +1300]:
>> 1.  synchronization of tag data with emails -- if they are in
>> a subfolder then it presents the issue of maintaining this
>> subfolder when managing emails (moving, deleting, duplicating etc)
>> and any .tag folder unaware clients are likely cause an breakage
>> in tagdata/message association.  One way of doing this is to have
>> a global .tag folder.
> 
> A global .tag folder indexed by e.g. message ID, as you state later,
> would probably allow for this. Or a file-per-tag design. We'd have
> to think carefully about pros and cons for each.
> 
> When thinking about this, I always have to remind myself that we are
> targetting this at a design that has indexed search. If that weren't
> the case, searches would be incredibly expensive.
> 
> Maybe a better approach would be content addressing (see below).


Content hashing -- good Idea (& not something that has hit me before) -- better 
than Message-Id as I believe there are still some MUA /MTAs that allow messages 
without message ids.  The only potential issue with this is that it is critical 
then to preserve the message source against encoding changes though that 
shouldn't be too hard to avoid.

> 
>> 2. what happens if that message is archived or moved to an
>> exclusively local cache -- eg. Mail.app on OS X can easily move
>> IMAP messages to a folder resident on the computers computers?
> 
> Well, if the target can store tags, then ideally the MUA should know
> how to transfer them along.
> 
> Maybe the right thing to do would be to use extended attributes
> (which are stored in the inode!), even if they may not be
> universally supported yet. If our solution scales, then this might
> lead to a significant increase in xattr adoption.
The problem with anything that is not universally supported is that for a 
package that is to appeal to a wide userbase, most don't know and don't care 
about the particulars of this IMAP server vs that IMAP server.  all they know 
it that for some reason it doesn't work with account X -- which leads to 
support head aches.

> 
>> 3. what happens with duplicates of emails -- I would assume that
>> the message id would be the key to match the tag data to the
>> message.  In this system a duplicate of a message could not have
>> a different set of tags from the original (not that this would
>> necessarily be desirable.)
> 
> Duplicates need folders, and tags and folders are somewhat at odds
> with each other. I mean, you can represent a folder hierarchy with
> tags (and more), and if you have tags and folders, you are
> potentially introducing a level of confusion/ambiguity that we don't
> want in the first place. Maybe the ideal solution doesn't need
> folders anymore (and IMAP-compatible (Maildir) subfolders have
> always been a hack anyway).
> 
> There are also two types of duplicates: copies and links. The former
> can diverge, the latter can't. I don't really see a reason for
> either. It's not like you need to copy a mail before you edit it,
> and I don't see a real reason for linking, assuming that the primary
> means of browsing will be tag-searches anyway.
> 
> Duplicates always make me think of content addressing, like Git's
> object cache. We could store the content hash of a message in its
> filename, and also use the hash to index into the tag database.
> I think that would be much cleaner than message IDs, and would make
> handling true duplicates (links) much easier, while copies (diverged
> ex-duplicates) would also be taken care of automatically.

I agree that conceptually duplicates should be buried but end users do have 
"peculiar" organization systems.

> 
> -snip-

>> The performance issue is very real -- because it means that
>> somehow messages have to rewritten to the IMAP server -- IMAP
>> doesn't have a mechanism AFAIK for updates.
> 
> Not even UIDPLUS?
> http://wiki.dovecot.org/FeatUIDPLUS


Re: [notmuch] Idea for storing tags

2010-01-13 Thread Carl Worth
On Wed, 13 Jan 2010 00:39:14 -0500, Scott Morrison sm...@indev.ca wrote:
  Maybe a better approach would be content addressing (see below).
 
 Content hashing -- good Idea ( not something that has hit me before)
 -- better than Message-Id as I believe there are still some MUA /MTAs
 that allow messages without message ids.  The only potential issue
 with this is that it is critical then to preserve the message source
 against encoding changes though that shouldn't be too hard to avoid.

Another problem with content-based naming for messages is that most of
the messages in my mail store that I consider duplicates don't actually
have identical content. (One is sent directly to me via CC and the other
is sent by the mailing-list software *after* appending a footer to the
message.)

That said, notmuch already does use a sha-1 sum as the message
identifier for any message that does not have a valid Message-ID
header. So there's definitely a place for this.

-Carl


pgpc0PE5MY7sx.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] Idea for storing tags

2010-01-12 Thread martin f krafft
also sprach Scott Robinson  [2010.01.12.1644 +1300]:
> I wrote a script to store and sync my tags.
> 
>   * One filename per message-ID.
>   * Line-feed seperated tags in each file.
> 
> Then the whole structure is controlled via git.
> Conflict-resolution and sync comes for free.

How do you ensure that the external tag store and your mail store do
not go out of sync? I assume that mails without a tagfile are simply
untagged, so that's hardly the issue. However, if you delete a mail,
how do you ensure that the tag database is cleaned up?

Also, do you attach tags automatically, e.g. with procmail on the
server? If so, how do you initiate git-pull locally?

Would you consider sharing your script?

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

"alle vorurteile kommen aus den eingeweiden."
 - friedrich nietzsche

spamtraps: madduck.bogus at madduck.net
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature (see http://martin-krafft.net/gpg/)
URL: 



[notmuch] Idea for storing tags

2010-01-12 Thread David A. Harding
On Tue, Jan 12, 2010 at 11:19:09AM +1300, martin f krafft wrote:
> I think [tag leakage] it makes in-headers unusable. After all, I don't
> ever want anyone else to know that I tag e-mails from my boss as
> "from-idiots", 

You can cryptographically hash tags so that third-parties can't read
the contents of the in-headers. For security, a salt should be appended
to the tag name to make dictionary attacks on the tags more difficult.
For their owners' convenience, mail clients will want a mapping of hash
to tag name.

> [...] pseudo-mails stored in Maildir and synchronised by IMAP

A single RFC2822 message can store the salt and hash-to-tag database. It
could contain a clear subject and directions to the end user not to move
or delete it. This would not, I think, terribly confuse existing mail
clients or their users.

-Dave
-- 
David A. HardingWebsite:  http://dtrt.org/
1 (609) 997-0765  Email:  dave at dtrt.org
Jabber/XMPP:  dharding at jabber.org


[notmuch] Idea for storing tags

2010-01-12 Thread martin f krafft
Folks, over in #notmuch, we just floated an idea that I'd like to
get out to you. We've been debating storing tags for messages.
Therefore I am cross-posting. Please forgive me.

So far, there are two approaches:

1. External database, which has the downside of not being
   synchronisable with standard IMAP, like the rest of your mail
   (assuming you use IMAP). Also, it's possible for mailstore and
   database to get out of sync.

2. In-headers, which has the downside of leaking (e.g. when
   bouncing), and incurs the risks associated with message rewrites
   (which I think is pretty much ignorable, but it's still there).
   Also, there's a performance issue, but in the context of an
   indexer like notmuch, this is negligible.

   The leakage is real, though and I think it makes in-headers
   unusable. After all, I don't ever want anyone else to know that
   I tag e-mails from my boss as "from-idiots", and I forward and
   bounce mail on a regular basis. I could tell my MTA to remove
   those headers, but I might forget to do that on a new system.

We also previously determined that IMAP keywords are pretty much
useless as they are stored per mailbox, not per message, not
standardised, and limited in their length anyway [0]. This also
means that we don't really need to investigate sensibly storing tags
in Maildir (e.g. with xattrs), because IMAP cannot transport them.

0. http://lists.madduck.net/pipermail/mailtags/2007-August/msg00016.html

Seriously, who implemented IMAPv4rev1 and what sort of crack were
they smoking??

I remember there was some KDE groupware contacts manager that used
IMAP to synchronise contacts. At first, this sounds horrible, but
when you detach IMAP from RFC822, it becomes a generic synchronising
protocol. The next step is then straight forward, and I want to
share this idea with you:

How about using pseudo-mails stored in Maildir and synchronised by
IMAP? E.g. every folder could have a subfolder .TAGS and if we find
a way to smartly pair messages between parent and subfolder, we'd
have a tag store alongside the mailstore it refers to, but without
the danger of leakage, and without having to rewrite messages.

The major problem with this is when clients don't understand this
"protocol", for then they will display all .TAGS folders as regular
IMAP folders, and try to treat the messages therein as regular
mails. Somewhere sometime this is bound to blow up and I don't
really know how to prevent that.

Anyway, the idea is out now. Thoughts?

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

echo Prpv a\'rfg cnf har cvcr | tr Pacfghnrvp Cnpstuaeic

spamtraps: madduck.bogus at madduck.net
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature (see http://martin-krafft.net/gpg/)
URL: 



[notmuch] Idea for storing tags

2010-01-11 Thread Scott Morrison

Thought you would be interested in my experiences and thoughts from actually 
doing this kind of stuff.  

With my software MailTags (www.indev.ca/MailTags.html) and I have looked at all 
these options and decided to go with storing tags in headers (in  json 
formatted data for the X-MailTags header)

I have thought seriously about using pseudo emails stored in a specially named 
directory but feel there are a couple of issues with this.
1.  synchronization of tag data with emails -- if they are in a 
subfolder then it presents the issue of maintaining this subfolder when 
managing emails (moving, deleting, duplicating etc) and any .tag folder unaware 
clients are likely cause an breakage in tagdata/message association.  One way 
of doing this is to have a global .tag folder.

2. what happens if that message is archived or moved to an exclusively 
local cache -- eg. Mail.app on OS X can easily move IMAP messages to a folder 
resident on the computers computers? -- 
3. what happens with duplicates of emails -- I would assume that the 
message id would be the key to match the tag data to the message.  In this 
system a duplicate of a message could not have a different set of tags from the 
original (not that this would necessarily be desirable.)


As I mentioned, I went with tags in headers -- though this has its own 
drawbacks.
Your mention of potential leakage (aka inadvertent disclosure of tag 
data) is real -- but only if the client used to bounce/forward is not the one 
to tag the message (one would assume that if a client can tag, it can know to 
exclude the tags in a bounce.)   Mail.app -- which I am pluging into does not 
forward headers -- though it will include all headers in a bounce -- but chance 
are you aren't tagging messages you are bouncing.:)

The performance issue is very real -- because it means that somehow 
messages have to rewritten to the IMAP server -- IMAP doesn't have a mechanism 
AFAIK for updates.  Additionally, IMAP doesn't have a mechanism for simply 
replacing one message data with another -- a new message must be written and 
the old message must be deleted and the message IMAP UID will change, and the 
client will have to deal with this especially if it is cache the messages.

Also GMAIL IMAP is an issue-  gmail IMAP is not IMAP -- it simply 
doesn't work like a true imap server -- writes to folders in gmail IMAP are 
translated to database updates where it is attributing a single record of the 
message with the folder it was "written" to.   Changing headers on a gmail IMAP 
message simply will not work because it will will reject the message as update 
of the single record (and not actually write the new data).

Still tags in headers meant that I didn't have to worry about making sure that 
the .tags folder is maintained appropriate (throughout moves and deletions) and 
that the data is stored much closer to the message for data recovery if it is 
ever needed and for archiving tags. -- in anycase -- this is what I have 
working -- though I am open to considering new approaches.

Scott

ps.  
also see my post to the mailtags-list from a few years back
http://lists.madduck.net/pipermail/mailtags/2007-August/msg00017.html

On 2010-01-11, at 5:19 PM, martin f krafft wrote:

> Folks, over in #notmuch, we just floated an idea that I'd like to
> get out to you. We've been debating storing tags for messages.
> Therefore I am cross-posting. Please forgive me.
> 
> So far, there are two approaches:
> 
> 1. External database, which has the downside of not being
>  synchronisable with standard IMAP, like the rest of your mail
>  (assuming you use IMAP). Also, it's possible for mailstore and
>  database to get out of sync.
> 
> 2. In-headers, which has the downside of leaking (e.g. when
>  bouncing), and incurs the risks associated with message rewrites
>  (which I think is pretty much ignorable, but it's still there).
>  Also, there's a performance issue, but in the context of an
>  indexer like notmuch, this is negligible.
> 
>  The leakage is real, though and I think it makes in-headers
>  unusable. After all, I don't ever want anyone else to know that
>  I tag e-mails from my boss as "from-idiots", and I forward and
>  bounce mail on a regular basis. I could tell my MTA to remove
>  those headers, but I might forget to do that on a new system.
> 
> We also previously determined that IMAP keywords are pretty much
> useless as they are stored per mailbox, not per message, not
> standardised, and limited in their length anyway [0]. This also
> means that we don't really need to investigate sensibly storing tags
> in Maildir (e.g. with xattrs), because IMAP cannot transport them.
> 
> 0. http://lists.madduck.net/pipermail/mailtags/2007-August/msg00016.html
> 
> Seriously, who implemented IMAPv4rev1 and what sort of crack were
> they smoking??
> 
> I remember there was some KDE groupware contacts manager that used
> IMAP to synchronise 

[notmuch] Idea for storing tags

2010-01-11 Thread Scott Robinson
I wrote a script to store and sync my tags.

  * One filename per message-ID.
  * Line-feed seperated tags in each file.

Then the whole structure is controlled via git. Conflict-resolution and sync
comes for free.

It isn't clear what use-case the earlier e-mail is aiming to satisfy. This is
how I solved my tag sync issues, though.