[notmuch] Idea for storing tags

2010-01-14 Thread martin f krafft
also sprach Carl Worth  [2010.01.14.1432 +1300]:
> Yes. This approach requires some external means of synchronizing the
> tags from one system to another.
> 
> I don't understand what it would mean to have the mailstore and the
> database out of synch here. This approach doesn't have the tags in the
> mailstore by definition, right?

You might have marked a message 'read' on one machine and if the two
get out of sync on another machine, you might have the same message
unread there.

> > How about using pseudo-mails stored in Maildir and synchronised by
> > IMAP? E.g. every folder could have a subfolder .TAGS and if we find
> > a way to smartly pair messages between parent and subfolder, we'd
> > have a tag store alongside the mailstore it refers to, but without
> > the danger of leakage, and without having to rewrite messages.
> ...
> > Anyway, the idea is out now. Thoughts?
> 
> There are a couple of problems that I don't see addressed at all with
> this approach. The first is that there's not a one-to-one mapping
> between messages and files in the mail store. (I'm CCed on a lot of list
> mail meaning that I have multiple files in my mail store for a single
> message.)

Shouldn't this just be solved? I've had formail+procmail delete my
duplicates for 10+ years, and while I don't like the fact that
I usually get the CC before the list mail, and thus cannot filter on
Delivered-To, I have never looked back.

> Second, the only reason I would be interested in synchronizing mail
> between two systems is so that I could manipulate the tag data in
> multiple places, (that is, remove the "unread" tag whether on my
> network-disconnected laptop or via web-mail when away from my
> laptop). Using imap for synchronizing a file of tags within the mail
> store gives you no mechanism for doing any sort of conflict resolution,
> right? (Which I think in almost all cases is going to be quite trivial
> if there's a chance for a program to resolve it.)

I have not thought about this, but you are right. IMAP does not
really allow for conflict resolution, which may well be *the* reason
why you cannot update existing messages.

> [*] Though, I think a plain-text file with tags managed with
> something like git (and perhaps a custom merger) could save a lot
> of work. Or perhaps a plain-text journal of tag manipulations on
> either end that could be replayed on the other.

Git is good at conflict resolution if run interactively, but [0]
still makes me question whether it can ever take the place of IMAP.
However, Asheesh Laroia, who has floated the idea of Git-for-mail at
DebConf8 already, has some ideas and hopefully will soon reply to my
mail [0], which I just bounced.

0. http://notmuchmail.org/pipermail/notmuch/2010/001114.html

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

apt-get source --compile gentoo

spamtraps: madduck.bogus at madduck.net
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature (see http://martin-krafft.net/gpg/)
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100114/6719fd61/attachment.pgp>


[notmuch] indexing mail?

2010-01-14 Thread Adrian Perez de Castro
On Thu, 14 Jan 2010 18:13:53 +0100, Arvid wrote:

> On Thu, 14 Jan 2010 09:38:00 +0100, Arvid Picciani  wrote:
> 
> > on the first run (when no .notmuch is there yet), it finds some 
> > messages, but doesn't index them either.

Yuk! I logged-in via Gmail's web interface and found that I have some new
messages which are not being picked by Notmuch.

> the offending commit is 2c4555f1a56602ff1dd55a63699810522ba4d91e
> 
> from readdir (3):
> 
>  "Currently, only some file systems (among them: Btrfs, ext2, ext3,
>  and ext4) have full  support  returning  the  file
>type in d_type.  All applications must properly handle a return
>  of DT_UNKNOWN."

I am using XFS, which always returns DT_UNKNOWN. Taking into account that
there is a good deal of people using filesystems other than the ones you
mention, and that other non-linux filesystems may also return DT_UNKNOWN,
in my opinion there should be a fall-back. I will try to post a patch
Anytime Soon?.

Also, I have the feeling that the "d_type" field from "struct dirent" may
not be available in some OSes because it is a BSD extension.

Cheers,

-- 
Adrian Perez de Castro 
Igalia - Free Software Engineering
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100114/c182949a/attachment.pgp>


[notmuch] indexing mail?

2010-01-14 Thread Arvid Picciani
On Thu, 14 Jan 2010 09:38:00 +0100, Arvid Picciani  wrote:

> on the first run (when no .notmuch is there yet), it finds some 
> messages, but doesn't index them either.

the offending commit is 2c4555f1a56602ff1dd55a63699810522ba4d91e

from readdir (3):

 "Currently, only some file systems (among them: Btrfs, ext2, ext3,
 and ext4) have full  support  returning  the  file
   type in d_type.  All applications must properly handle a return
 of DT_UNKNOWN."


thanks "kanru" for helping on irc.



[notmuch] Notmuch performance problems on OSX

2010-01-14 Thread Oliver Charles
Actually, significant performance problems. Ho ho ho. (sorry)

I've installed the latest notmuch from Git at this time of writing,
along with Xapian from SVN head. However, just tagging a single thread
with only one message seems to take too long:

$ time notmuch tag +dissertation thread:7dc536441e6deade4256a46d46451221

real0m0.812s
user0m0.022s
sys 0m0.037s

And tagging all my messages is really horrible:

$ time notmuch tag +foobar tag:inbox

real0m5.076s
user0m3.688s
sys 0m0.105s

Here is what my notmuch binary links with:

$ otool -L /usr/local/bin/notmuch
/usr/local/bin/notmuch:
/usr/local/Cellar/gmime/2.4.0/lib/libgmime-2.4.2.dylib (compatibility
version 7.0.0, current version 7.0.0)
/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 
1.2.3)
/usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 
7.0.0)
/usr/local/Cellar/glib/2.20.5/lib/libgobject-2.0.0.dylib
(compatibility version 2001.0.0, current version 2001.5.0)
/usr/local/Cellar/glib/2.20.5/lib/libglib-2.0.0.dylib (compatibility
version 2001.0.0, current version 2001.5.0)
/usr/local/Cellar/gettext/0.17/lib/libintl.8.dylib (compatibility
version 9.0.0, current version 9.2.0)
/usr/local/Cellar/xapian-svn/HEAD/lib/libxapian-1.1.3.dylib
(compatibility version 4.0.0, current version 4.0.0)
/usr/lib/libstdc++.6.dylib (compatibility version 7.0.0, current 
version 7.4.0)
/usr/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 
1.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
version 111.1.4)

That xapian-svn was built from svn HEAD right now, so I'm assuming it
contains the #250 fix (http://trac.xapian.org/changeset/13808)

Any ideas?

-- 
Oliver Charles / aCiD2


[notmuch] Notmuch performance problems on OSX

2010-01-14 Thread Carl Worth
Hi Oliver, welcome to notmuch!

On Thu, 14 Jan 2010 15:30:48 +, Oliver Charles  wrote:
> I've installed the latest notmuch from Git at this time of writing,
> along with Xapian from SVN head. However, just tagging a single thread
> with only one message seems to take too long:
> 
> $ time notmuch tag +dissertation thread:7dc536441e6deade4256a46d46451221
> 
> real  0m0.812s
> user  0m0.022s
> sys   0m0.037s

Things work quite a bit faster than that on my machine:

$ time notmuch tag +foo id:5641883d1001140730l22832715ld6bdc95c9938d314 at 
mail.gmail.com

real0m0.024s
user0m0.012s
sys 0m0.004s

But that could just be system differences.

> And tagging all my messages is really horrible:
> 
> $ time notmuch tag +foobar tag:inbox
> 
> real  0m5.076s
> user  0m3.688s
> sys   0m0.105s

For this operation, I can't really compare. How many messages are you
tagging? Here's that operation for me with 525 messages in my inbox:

$ time notmuch tag +foobar tag:inbox

real0m1.551s
user0m1.504s
sys 0m0.016s

> That xapian-svn was built from svn HEAD right now, so I'm assuming it
> contains the #250 fix (http://trac.xapian.org/changeset/13808)

Which I think means that things could have been even *much* slower
before. ;-)

The Xapian defect #250 was just one, initial (and obvious) performance
problem. [Though, as I mentioned in a previous thread, if you're using a
Xapian flint database, (look for .notmuch/xapian/iamflint), then you
won't get the benefit of the Xapian fix until you rebuild your notmuch
database from scratch with a current notmuch.]

Once you've verified that you've got the #250 fix functional, there
could still be lots of performance bugs. And it would be time to start
profiling.

Perhaps the "notmuch daemon" idea (which we've proposed earlier for
other reasons) could help reduce overhead from reading the database and
writing it back out again. So that might be one avenue to explore for
fixing things.

I have no idea what OS X does, but Linux keeps my notmuch database in
its buffer cache so I can do these operations without even touching
"disk" (which is actually an SSD anyway, which also helps). I just
tried, and was able to get the single-message tag operation to be 3
times slower by dropping the cache:

$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches "
$ time notmuch tag +foo id:5641883d1001140730l22832715ld6bdc95c9938d314 at 
mail.gmail.com

real0m0.062s
user0m0.000s
sys 0m0.020s

But again, whatever the performance problem might be, the first step
would be to examine some profiles. (And I'm clueless, myself as to what
profiling tools might be available for OS X.)

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100114/f398ab58/attachment.pgp>


[notmuch] Thoughts on notmuch and Lua

2010-01-14 Thread Carl Worth
On Thu, 14 Jan 2010 10:47:13 +0200, Ali Polatel  wrote:
> Before trying to implement anything I decided to send a mail to the list
> to ask people's opinion.

Hi Ali, welcome to notmuch!

I appreciate you soliciting opinions, but I hope that my answer won't
discourage you. By all means, please feel free to experiment!

> What's the problem?
> ===
> Notmuch isn't very configurable.

I'll grant that. And as can be seen in TODO and in code comments, we
definitely want to fix that.

> 1. Configuration file:
> The configuration file can be a Lua script that allows more dynamic
> configuration. Here's an example:
> 
> # notmuch configuration file:
> config = {}
> config.dbpath = "/path/to/maildir"
> config.exclude = function (maildir)
> return not string.match(maildir, ".*Trash.*")
> end
> ...

That doesn't look very compelling to me.

I'd much rather have:

[database]
path=/home/cworth/mail
maildir_exclude=.*Trash.*

with the exact same functionality.

Granted, having a full programming language in the configuration file
makes thing much more dynamic, but it also makes it much harder for the
user to read, edit, and ensure the syntax is correct.

> 2. Hooks:
> This is a feature I really miss having switched from sup.
> There can be many hooks, a hook that formats search output,
> a hook that is called before adding messages to the database which may
> be used to add initial tags depending on headers etc.

I understand that some people really like their hooks. They let users
invent all kinds of interesting, custom functionality.

But I think hooks also have problems. Sometimes the most interesting
functionality has to be pieced together by every user going to a wiki
page and finding the "standard" hooks. I'd much rather avoid that by
getting the most useful functionality into the program in the first
place.

Hooks also impose a particular amount of maintenance burden on the
software. And they are often implemented in a way that makes them very
hard to be discovered.

I wrote a message to the sup mailing list describing some of these
issues. The context there was a patch I wrote adding a configuration
option, (and the sup maintainer preferring it be added as a patch
instead):

id:1254417826-sup-6584 at yoom.home.cworth.org
Subject: Re: [sup-talk] [PATCH] Add new :crypto_default configuration 
option.

I did find out later that the sup hooks were more self-documenting than
I had understood. (There was a sup command-line option that printed
documentation for all available hooks.) Something like that is
definitely a requirement for providing hooks.

So I'm not entirely opposed to the idea of adding hooks to notmuch, but
I'll definitely need to be convinced that any particular functionality
can't be better integrated without the hook.

> Why Lua?
> 
> Lua has many advantages over other scripting languages when it comes to
> integration with a C program. It has a very clean and easy C API, the
> overhead of running Lua scripts is not noticable among other things.

I've definitely heard lots of good things about "lua embedability". So
if we do decide to provide hooks, then lua would seem like a logical
option to look at first. I've never looked at it closely myself though.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100114/de7d8874/attachment.pgp>


[notmuch] indexing mail?

2010-01-14 Thread Carl Worth
On Thu, 14 Jan 2010 18:38:54 +0100, Adrian Perez de Castro  wrote:
> > the offending commit is 2c4555f1a56602ff1dd55a63699810522ba4d91e
> > 
> > from readdir (3):
> > 
> >  "Currently, only some file systems (among them: Btrfs, ext2, ext3,
> >  and ext4) have full  support  returning  the  file
> >type in d_type.  All applications must properly handle a return
> >  of DT_UNKNOWN."

Yes. The broken code was my mistake. I clearly didn't read the above
warning closely enough. Sorry about that!

> I am using XFS, which always returns DT_UNKNOWN. Taking into account that
> there is a good deal of people using filesystems other than the ones you
> mention, and that other non-linux filesystems may also return DT_UNKNOWN,
> in my opinion there should be a fall-back. I will try to post a patch
> Anytime Soon?.

We definitely want the fallback. I can attempt to code it, but I don't
have ready access to an afflicted filesystem, so I'd need help testing
anyway.

I'd love to see a patch for this bug soon. Be sure to CC me when the
patch is sent and that will help me commit it sooner.

> Also, I have the feeling that the "d_type" field from "struct dirent" may
> not be available in some OSes because it is a BSD extension.

I'm generally quite bad at determining whether functionality I'm using
in my software is non-portable. As proven in this case, even when the
man page tells me something is not portable I don't always notice, (and
often, the man pages aren't even that useful).

Beyond that, even if something is *known* to be theoretically
non-portable, it can be a waste of time to code compatibility paths that
nobody will be running in practice.

So I've basically gotten to the point where I just code for what works
on my system, (not out of disregard for what other people run---just
that it's impossible for me to know what subset of functionality is
actually relevant). Then, at the same time, I'm quite happy to accept
code to improve the portability when people note that things are broken
on other systems.

See the git history and email archives for examples of how we fixed
strndup and getline portability problems.

I know that "wait for people to notice it's broken" isn't the nicest
thing we could do with our code. But I don't really know a much better
way. I'm happy to entertain suggestions here.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100114/de069ea5/attachment.pgp>


[notmuch] Idea for storing tags

2010-01-14 Thread Carl Worth
On Thu, 14 Jan 2010 21:04:21 +1300, martin f krafft  
wrote:
> You might have marked a message 'read' on one machine and if the two
> get out of sync on another machine, you might have the same message
> unread there.

That's a different issue though. With two databases there's clearly the
opportunity for the two databases to be out of synch.

But you talked about the database being out of synch with respect to the
mailstore. And that's something I just don't understand, (given the
assumption that all tags are stored in the database---which was the
explicit description of the case of interest).

> Shouldn't this just be solved? I've had formail+procmail delete my
> duplicates for 10+ years, and while I don't like the fact that
> I usually get the CC before the list mail, and thus cannot filter on
> Delivered-To, I have never looked back.

Notmuch has access to all the information it needs to allow you to
delete the CC version once the list mail arrives. So you could do
notmuch-based deletion now and avoid losing the Delivered-To header if
you want.

> > [*] Though, I think a plain-text file with tags managed with
> > something like git (and perhaps a custom merger) could save a lot
> > of work. Or perhaps a plain-text journal of tag manipulations on
> > either end that could be replayed on the other.
> 
> Git is good at conflict resolution if run interactively, but [0]
> still makes me question whether it can ever take the place of IMAP.
> However, Asheesh Laroia, who has floated the idea of Git-for-mail at
> DebConf8 already, has some ideas and hopefully will soon reply to my
> mail [0], which I just bounced.
> 
> 0. http://notmuchmail.org/pipermail/notmuch/2010/001114.html

Using git for mail is an interesting idea, but not what I was actually
proposing here.

I think that synchronizing the mail store and synchronizing the tags
information are tasks that have different requirements, and for which we
may well want different tools.

So I was talking about using imap (or rsync, or what have you) for
copying the mailtstore, and then having something with a bit more
domain-specific awareness for doing the synchronization of the tags
data.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100114/46b3bb9e/attachment.pgp>


[notmuch] [PATCH] Use libgcrypt for hashing.

2010-01-14 Thread Carl Worth
On Fri, 08 Jan 2010 15:43:52 -0500, micah anderson  wrote:
> Its good that this is not a burden to maintain for the notmuch project,
> even better that Mikhail, the libsha1 maintainer, is currently active in
> this project and has volunteered to maintain the in-tree copy. 
> 
> However, the problem that has been raised is about the code-maintenance
> burden that distributions face. In fact, this is not an unique problem
> to notmuch, if it was it wouldn't be such a big deal. The reality is
> that the more projects which cargo-cult around 'convenience copies' of
> code, the more of a burden is placed on the distributors.
> 
> In some ways, the notmuch project and the role of distributors are at
> cross-purposes on this issue, each side has an argument that makes sense
> From their individual perspectives.

Well, I think it's important for notmuch to ease the burden on the
distribution as well. That's just a matter of being a good citizen.

If notmuch were including code that existed as a library package in
Debian, say. Then that would definitely be problematic, and notmuch
should be fixed to link with the library.

We could get to that point if someone wanted to package libsha1, say.

> > What might make more sense is an option to compile against an existing
> > library (if present) but not to introduce an error in the build if the
> > library is not present, (in which case just build the builtin libsha1.c
> > code).
> 
> This makes the most sense, and resolves the issue in a way that both
> sides of the issue benefit!

I'd be glad to see a patch that does that.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100114/6b278409/attachment.pgp>


[notmuch] Threading

2010-01-14 Thread Carl Worth
On Fri, 8 Jan 2010 16:12:38 +1300, martin f krafft  
wrote:
> Reading is one thing. Information storage and organisation is
> another. After a message is delivered (and read) to my mailbox, it's
> really mine and I can (and should be able) to affix it and integrate
> it into my organisational scheme any way I want, don't you think?

A fair point.

I don't see this being something I'm going to spend any time
implementing. I just wouldn't use the functionality myself. But I would
be happy to integrate patches if someone came up with some.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100114/26d0ef3c/attachment.pgp>


[notmuch] Thoughts on notmuch and Lua

2010-01-14 Thread Ali Polatel
Before trying to implement anything I decided to send a mail to the list
to ask people's opinion.

What's the problem?
===
Notmuch isn't very configurable.

How can Lua integration solve this?
===
Here are initial thoughts on how to integrate Lua with notmuch.
Any comments appreciated.

1. Configuration file:
The configuration file can be a Lua script that allows more dynamic
configuration. Here's an example:

# notmuch configuration file:
config = {}
config.dbpath = "/path/to/maildir"
config.exclude = function (maildir)
return not string.match(maildir, ".*Trash.*")
end
...

2. Hooks:
This is a feature I really miss having switched from sup.
There can be many hooks, a hook that formats search output,
a hook that is called before adding messages to the database which may
be used to add initial tags depending on headers etc.

Why Lua?

Lua has many advantages over other scripting languages when it comes to
integration with a C program. It has a very clean and easy C API, the
overhead of running Lua scripts is not noticable among other things.

-- 
Regards,
Ali Polatel
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100114/67708c7c/attachment-0001.pgp>


[notmuch] indexing encrypted messages (was: OpenPGP support)

2010-01-14 Thread Olly Betts
On 2010-01-08, James Westby wrote:
> That would leave an open question over whether future notmuch show
> invocations would return the plaintext or ciphertext. If it is the
> latter then it requires decrypting every time you want to view it, but
> it does mean that there is less information leakage (you could find out
> whether an encrypted message contained a particular term, but not read
> the whole message directly).

You can actually use the term position information to reconstruct the
original message text pretty well.  It misses capitalisation, punctuation,
and distinctions between whitespace, but is generally enough to allow
the message to be understood:

http://article.gmane.org/gmane.comp.search.xapian.general/2187

Cheers,
Olly



[notmuch] indexing mail?

2010-01-14 Thread Arvid Picciani
Hi,
how do you add new mails to the index?
manual says "notmuch new" should be enough, but it simply says
"No new mail."

on the first run (when no .notmuch is there yet), it finds some 
messages, but doesn't index them either.

$ notmuch search tag:inbox
$

$ notmuch search s
$


-- 
Arvid
Asgaard Technologies


[notmuch] [RFC/PATCH v2] Add search-files command

2010-01-14 Thread Ali Polatel
Jameson Rollins yazm??:
> On Wed, Jan 13, 2010 at 03:17:41PM +0200, Ali Polatel wrote:
> > This command can be used to integrate notmuch with other MUAs as a
> > searching client. The idea is simple, a simple script could get
> > search-terms as argument and create a "virtual" maildir which has
> > symbolic links to files output by search-files command. This is similar
> > to nmzmail.
> 
> Hi, Ali.  I was also recently asking about a way to output just the
> file names of message resulting from searches.  This is an important
> feature for handling deleting and moving in mail clients as well.  I
> believe that Carl said this would be easier once he applied the JSON
> output patches that are in the queue right now.  Hopefully we'll see
> those soon.
> 
> Personally I think the right way to implement this from a UI
> perspective would be to just have an output filter for the 'search'
> subcommand, something like:
> 
> notmuch search --output=filename ...
> 
> If output formatting was well enough supported one could even imagine
> getting rid of the 'show' subcommand in favor of just the 'search'
> subcommand with output formatting options.

That's even better! I think I'll be using my patch until these patches
are merged :)

> 
> jamie.

-- 
Regards,
Ali Polatel
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100114/d59cd8ef/attachment.pgp>


[notmuch] Potential problem using Git for mail (was: Idea for storing tags)

2010-01-14 Thread Asheesh Laroia
On Tue, 12 Jan 2010, martin f krafft wrote:

> If the MDA delivers to Git, then potentially, you might get into a 
> situation where you cannot write your own changes back to the repo. This 
> is also a DoS scenario: I'll just keep sending you e-mail, and if I 
> manage to pass your mail filters, I'll basically commit to your mail 
> repository at regular intervals. Say those are 5 seconds. In order for 
> you to write updates to the repo, e.g. to update tags, then you would 
> need to pull, rebase, and push all within 5 seconds, for otherwise you'd 
> try to push non-fast-forwards.

Sure. But the MDA doesn't need to do the commit immediately. Since 
(presumably) we're using Maildir, the MDA on the mail receiving server is 
going to generate filenames that won't cause conflicts. So it's okay to 
leave the files uncommitted.

If that's too scary, then have the MDA deliver to its own git branch with 
its own checkout. Then, if you can force linearity with a lock (!), your 
client can have a special "lock the repo and push" command. Your remote 
MUA could even ask the MDA to lock the Maildir while it does a merge and 
then pushes that, and then the MDA can go back to dequeuing messages from 
the MTA into the Maildir.

Not the beautiful lockless world the purists want, but I'm okay with that.

> This a bit unrealistic, surely, but there's a real annoyance in it: 
> you'd have to pull/rebase/push until a push succeeds ? until you found a 
> time window between pull and push during which the MDA didn't write to 
> the repo. This might take a long time. If this happens in the background 
> by Cron, it's not a real concern, but if this becomes a UI issue, I 
> wouldn't know how to handle it.

It's not entirely unreasonable. Cron caused issues like that for me when I 
tracked my Maildir in git.

I'm just learning about notmuchmail.org, but I'll keep listening here. 
Preferably CC: me on replies to this mail.

I will say, I'm interested in an email setup with with working IMAP on at 
least one side.

There's one other bad race I ran into when using git to manage my 
Maildirs. I was using Dovecot to serve my Maildir to an IMAP client, 
alpine. I separately did a "git merge" from origin/master, where the 
remote MTA had an MDA deliving messages and a layer on top of that 
committed them.

When I did the "git merge", git would create the Maildir files in 
~/Maildir/cur/... non-atomically. Dovecot would notice the file in 
~/Maildir/cur/ and think, "This file must be ready!" So it would parse it 
even though git hadn't finished writing it. This caused me to only see 
partial headers in Alpine since Dovecot parsed it before it was a complete 
message.

That kind of sucked.

-- Asheesh.

-- 
Almost anything derogatory you could say about today's software design
would be accurate.
-- K. E. Iverson


[notmuch] indexing mail?

2010-01-14 Thread Arvid Picciani

Hi,
how do you add new mails to the index?
manual says notmuch new should be enough, but it simply says
No new mail.

on the first run (when no .notmuch is there yet), it finds some 
messages, but doesn't index them either.


$ notmuch search tag:inbox
$

$ notmuch search s
$


--
Arvid
Asgaard Technologies
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] indexing encrypted messages (was: OpenPGP support)

2010-01-14 Thread Olly Betts
On 2010-01-08, James Westby wrote:
 That would leave an open question over whether future notmuch show
 invocations would return the plaintext or ciphertext. If it is the
 latter then it requires decrypting every time you want to view it, but
 it does mean that there is less information leakage (you could find out
 whether an encrypted message contained a particular term, but not read
 the whole message directly).

You can actually use the term position information to reconstruct the
original message text pretty well.  It misses capitalisation, punctuation,
and distinctions between whitespace, but is generally enough to allow
the message to be understood:

http://article.gmane.org/gmane.comp.search.xapian.general/2187

Cheers,
Olly

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] Idea for storing tags

2010-01-14 Thread Carl Worth
On Thu, 14 Jan 2010 21:04:21 +1300, martin f krafft madd...@madduck.net wrote:
 You might have marked a message 'read' on one machine and if the two
 get out of sync on another machine, you might have the same message
 unread there.

That's a different issue though. With two databases there's clearly the
opportunity for the two databases to be out of synch.

But you talked about the database being out of synch with respect to the
mailstore. And that's something I just don't understand, (given the
assumption that all tags are stored in the database---which was the
explicit description of the case of interest).

 Shouldn't this just be solved? I've had formail+procmail delete my
 duplicates for 10+ years, and while I don't like the fact that
 I usually get the CC before the list mail, and thus cannot filter on
 Delivered-To, I have never looked back.

Notmuch has access to all the information it needs to allow you to
delete the CC version once the list mail arrives. So you could do
notmuch-based deletion now and avoid losing the Delivered-To header if
you want.

  [*] Though, I think a plain-text file with tags managed with
  something like git (and perhaps a custom merger) could save a lot
  of work. Or perhaps a plain-text journal of tag manipulations on
  either end that could be replayed on the other.
 
 Git is good at conflict resolution if run interactively, but [0]
 still makes me question whether it can ever take the place of IMAP.
 However, Asheesh Laroia, who has floated the idea of Git-for-mail at
 DebConf8 already, has some ideas and hopefully will soon reply to my
 mail [0], which I just bounced.
 
 0. http://notmuchmail.org/pipermail/notmuch/2010/001114.html

Using git for mail is an interesting idea, but not what I was actually
proposing here.

I think that synchronizing the mail store and synchronizing the tags
information are tasks that have different requirements, and for which we
may well want different tools.

So I was talking about using imap (or rsync, or what have you) for
copying the mailtstore, and then having something with a bit more
domain-specific awareness for doing the synchronization of the tags
data.

-Carl


pgpiO4aGHApgV.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] Idea for storing tags

2010-01-14 Thread martin f krafft
also sprach Carl Worth cwo...@cworth.org [2010.01.15.1124 +1300]:
  You might have marked a message 'read' on one machine and if the two
  get out of sync on another machine, you might have the same message
  unread there.
 
 That's a different issue though. With two databases there's clearly the
 opportunity for the two databases to be out of synch.
 
 But you talked about the database being out of synch with respect to the
 mailstore. And that's something I just don't understand, (given the
 assumption that all tags are stored in the database---which was the
 explicit description of the case of interest).

Yes, we are talking about the situation where the tagstore is
seperate from the mailstore, and that they are both synchronised
with a server, or between machines, separately. If for some reason
you only synchronise the mailstore — say because the connection
drops before the sync of the tagstore completes — then you end up
with an out-of-sync situation, because the mailstore-sync will have
pulled in a new message, but not the associated tags. So if you had
already read this message on another machine and tagged it 'done',
then it would show up on this machine as 'new' without the 'done'
tag, because the tags were not synchronised.

The only way to really solve this is by transferring a message and
its tags in a transactional way.

  Shouldn't this just be solved? I've had formail+procmail delete my
  duplicates for 10+ years, and while I don't like the fact that
  I usually get the CC before the list mail, and thus cannot filter on
  Delivered-To, I have never looked back.
 
 Notmuch has access to all the information it needs to allow you to
 delete the CC version once the list mail arrives. So you could do
 notmuch-based deletion now and avoid losing the Delivered-To header if
 you want.

Of course. I hadn't thought that far.

However, there are still benefits to formail, namely avoiding having
to run duplicates through potentially expensive spamfilters.

 I think that synchronizing the mail store and synchronizing the
 tags information are tasks that have different requirements, and
 for which we may well want different tools.

Fair enough. Maybe I am just paranoid about the stores getting out
of sync (see above).

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
we all know linux is great...
 it does infinite loops in 5 seconds.
 -- linus torvalds
 
spamtraps: madduck.bo...@madduck.net


digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] Threading

2010-01-14 Thread martin f krafft
also sprach Carl Worth cwo...@cworth.org [2010.01.15.1108 +1300]:
  Reading is one thing. Information storage and organisation is
  another. After a message is delivered (and read) to my mailbox,
  it's really mine and I can (and should be able) to affix it and
  integrate it into my organisational scheme any way I want, don't
  you think?
 
 A fair point.
 
 I don't see this being something I'm going to spend any time
 implementing. I just wouldn't use the functionality myself. But
 I would be happy to integrate patches if someone came up with
 some.

Maybe I should try to persuade you in person.

Just today I referenced a discussion I had with a client's ISP,
which was done via a web-based support system (custhelp.com). They
send you e-mail for every post you or they make to the thread, but
those e-mails do not reference each other. Fortunately, I stitched
them together and when I searched for the correspondence in my
mailstore, I had the entire thread available to me, which was handy
(thanks to mutt's useful thread handling abilities).

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
this week dragged past me so slowly;
 the days fell on their knees...
-- david bowie
 
spamtraps: madduck.bo...@madduck.net


digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] indexing mail?

2010-01-14 Thread Carl Worth
On Thu, 14 Jan 2010 18:38:54 +0100, Adrian Perez de Castro ape...@igalia.com 
wrote:
  the offending commit is 2c4555f1a56602ff1dd55a63699810522ba4d91e
  
  from readdir (3):
  
   Currently, only some file systems (among them: Btrfs, ext2, ext3,
   and ext4) have full  support  returning  the  file
 type in d_type.  All applications must properly handle a return
   of DT_UNKNOWN.

Yes. The broken code was my mistake. I clearly didn't read the above
warning closely enough. Sorry about that!

 I am using XFS, which always returns DT_UNKNOWN. Taking into account that
 there is a good deal of people using filesystems other than the ones you
 mention, and that other non-linux filesystems may also return DT_UNKNOWN,
 in my opinion there should be a fall-back. I will try to post a patch
 Anytime Soon™.

We definitely want the fallback. I can attempt to code it, but I don't
have ready access to an afflicted filesystem, so I'd need help testing
anyway.

I'd love to see a patch for this bug soon. Be sure to CC me when the
patch is sent and that will help me commit it sooner.

 Also, I have the feeling that the d_type field from struct dirent may
 not be available in some OSes because it is a BSD extension.

I'm generally quite bad at determining whether functionality I'm using
in my software is non-portable. As proven in this case, even when the
man page tells me something is not portable I don't always notice, (and
often, the man pages aren't even that useful).

Beyond that, even if something is *known* to be theoretically
non-portable, it can be a waste of time to code compatibility paths that
nobody will be running in practice.

So I've basically gotten to the point where I just code for what works
on my system, (not out of disregard for what other people run---just
that it's impossible for me to know what subset of functionality is
actually relevant). Then, at the same time, I'm quite happy to accept
code to improve the portability when people note that things are broken
on other systems.

See the git history and email archives for examples of how we fixed
strndup and getline portability problems.

I know that wait for people to notice it's broken isn't the nicest
thing we could do with our code. But I don't really know a much better
way. I'm happy to entertain suggestions here.

-Carl


pgpUGYn1hAchH.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] Notmuch performance problems on OSX

2010-01-14 Thread Carl Worth
Hi Oliver, welcome to notmuch!

On Thu, 14 Jan 2010 15:30:48 +, Oliver Charles 
oliver.g.char...@googlemail.com wrote:
 I've installed the latest notmuch from Git at this time of writing,
 along with Xapian from SVN head. However, just tagging a single thread
 with only one message seems to take too long:
 
 $ time notmuch tag +dissertation thread:7dc536441e6deade4256a46d46451221
 
 real  0m0.812s
 user  0m0.022s
 sys   0m0.037s

Things work quite a bit faster than that on my machine:

$ time notmuch tag +foo 
id:5641883d1001140730l22832715ld6bdc95c9938d...@mail.gmail.com

real0m0.024s
user0m0.012s
sys 0m0.004s

But that could just be system differences.

 And tagging all my messages is really horrible:
 
 $ time notmuch tag +foobar tag:inbox
 
 real  0m5.076s
 user  0m3.688s
 sys   0m0.105s

For this operation, I can't really compare. How many messages are you
tagging? Here's that operation for me with 525 messages in my inbox:

$ time notmuch tag +foobar tag:inbox

real0m1.551s
user0m1.504s
sys 0m0.016s

 That xapian-svn was built from svn HEAD right now, so I'm assuming it
 contains the #250 fix (http://trac.xapian.org/changeset/13808)

Which I think means that things could have been even *much* slower
before. ;-)

The Xapian defect #250 was just one, initial (and obvious) performance
problem. [Though, as I mentioned in a previous thread, if you're using a
Xapian flint database, (look for .notmuch/xapian/iamflint), then you
won't get the benefit of the Xapian fix until you rebuild your notmuch
database from scratch with a current notmuch.]

Once you've verified that you've got the #250 fix functional, there
could still be lots of performance bugs. And it would be time to start
profiling.

Perhaps the notmuch daemon idea (which we've proposed earlier for
other reasons) could help reduce overhead from reading the database and
writing it back out again. So that might be one avenue to explore for
fixing things.

I have no idea what OS X does, but Linux keeps my notmuch database in
its buffer cache so I can do these operations without even touching
disk (which is actually an SSD anyway, which also helps). I just
tried, and was able to get the single-message tag operation to be 3
times slower by dropping the cache:

$ sudo sh -c echo 3  /proc/sys/vm/drop_caches 
$ time notmuch tag +foo 
id:5641883d1001140730l22832715ld6bdc95c9938d...@mail.gmail.com

real0m0.062s
user0m0.000s
sys 0m0.020s

But again, whatever the performance problem might be, the first step
would be to examine some profiles. (And I'm clueless, myself as to what
profiling tools might be available for OS X.)

-Carl


pgpzh9qB9woVQ.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] Thoughts on notmuch and Lua

2010-01-14 Thread martin f krafft
also sprach Carl Worth cwo...@cworth.org [2010.01.15.1200 +1300]:
  Lua has many advantages over other scripting languages when it
  comes to integration with a C program. It has a very clean and
  easy C API, the overhead of running Lua scripts is not noticable
  among other things.
 
 I've definitely heard lots of good things about lua
 embedability. So if we do decide to provide hooks, then lua would
 seem like a logical option to look at first. I've never looked at
 it closely myself though.

Lua for hooks has the advantage that the hooks can be executed in
the context of manipulateable objects. On the other hand, hooks in
the style of run-parts directories are more flexible and accessible,
and could always be invoked as filters for the manipulateable data.

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
imagine if every thursday your shoes exploded if you
 tied them the usual way. this happens to us all the time
 with computers, and nobody thinks of complaining.
-- jeff raskin
 
spamtraps: madduck.bo...@madduck.net


digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] indexing mail?

2010-01-14 Thread Olly Betts
On 2010-01-14, Carl Worth wrote:
 On Thu, 14 Jan 2010 18:38:54 +0100, Adrian Perez de Castro 
 ape...@igalia.com wrote:
 I am using XFS, which always returns DT_UNKNOWN. Taking into account that
 there is a good deal of people using filesystems other than the ones you
 mention, and that other non-linux filesystems may also return DT_UNKNOWN,
 in my opinion there should be a fall-back. I will try to post a patch
 Anytime Soon=E2=84=A2.

 We definitely want the fallback. I can attempt to code it, but I don't
 have ready access to an afflicted filesystem, so I'd need help testing
 anyway.

 I'd love to see a patch for this bug soon. Be sure to CC me when the
 patch is sent and that will help me commit it sooner.

Not a full patch, but I already posted what this code should look like
to handle both systems without d_type, and those which return DT_UNKNOWN:

http://article.gmane.org/gmane.mail.notmuch.general/1044

Cheers,
Olly

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] Notmuch performance problems on OSX

2010-01-14 Thread Olly Betts
On 2010-01-14, Oliver Charles wrote:
 I've installed the latest notmuch from Git at this time of writing,
 along with Xapian from SVN head. However, just tagging a single thread
 with only one message seems to take too long:

One difference between OS X and other systems is that OS X supports the
F_FULLSYNC ioctl, and other systems don't (currently, at least AFAIK)
and Xapian uses that if it is available to ensure that changes have
actually made it to disk:

http://trac.xapian.org/ticket/288

On other systems, it uses fdatasync() or fsync(), which typically just
ensure that the data has left the OS - it can sit in disk controller or
drive caches for potentially seconds longer.  This call happens once
per table for every (explicit or implicit) flush on a database.

I can see an issue here which is that currently Xapian writes the base
file for the table, then syncs it, then does the next table.  I bet it
would be more efficient to write them all and then sync them all,
especially with F_FULLSYNC.

I'll take a look at doing that, and have created a ticket for it:

http://trac.xapian.org/ticket/426

If after that this is still causing problems, it should probably be made
configurable what (if any) flushing is done.  If you're on a UPS-backed
server, you probably don't need such paranoia.

Cheers,
Olly

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] indexing mail?

2010-01-14 Thread Dirk-Jan C . Binnema
 Olly == Olly Betts o...@survex.com writes:

Olly On 2010-01-14, Carl Worth wrote:
 On Thu, 14 Jan 2010 18:38:54 +0100, Adrian Perez de Castro 
ape...@igalia.com wrote:
 I am using XFS, which always returns DT_UNKNOWN. Taking into account 
that
 there is a good deal of people using filesystems other than the ones you
 mention, and that other non-linux filesystems may also return 
DT_UNKNOWN,
 in my opinion there should be a fall-back. I will try to post a patch
 Anytime Soon=E2=84=A2.
 
 We definitely want the fallback. I can attempt to code it, but I don't
 have ready access to an afflicted filesystem, so I'd need help testing
 anyway.
 
 I'd love to see a patch for this bug soon. Be sure to CC me when the
 patch is sent and that will help me commit it sooner.

Olly Not a full patch, but I already posted what this code should look like
Olly to handle both systems without d_type, and those which return 
DT_UNKNOWN:

Olly http://article.gmane.org/gmane.mail.notmuch.general/1044

I take a slighly different approach in mu:

/* if the file system does not support entry-d_type, we add it ourselves
 * this is slower (extra stat) but at least it works
 */
static gboolean
_set_dtype (const char* path, struct dirent *entry)
{
struct stat statbuf;
char fullpath[4096];

snprintf (fullpath, sizeof(fullpath), %s%c%s,
  path, G_DIR_SEPARATOR, entry-d_name);

if (stat (fullpath, statbuf) != 0) {
g_warning (stat failed on %s: %s, fullpath,
   strerror(errno));
return FALSE;
}

/* we only care about dirs, regular files and links */
if (S_ISREG (statbuf.st_mode))
entry-d_type = DT_REG;
else if (S_ISDIR (statbuf.st_mode))
entry-d_type = DT_DIR;
else if (S_ISLNK (statbuf.st_mode))
entry-d_type = DT_LNK;

return TRUE;
}


and then in some other places:

/* handle FSs that don't support entry-d_type */
if (entry-d_type == DT_UNKNOWN) 
_set_dtype (dirname, entry);


Note, that is untested as of yet.

Best wishes,
Dirk.

-- 
Dirk-Jan C. Binnema  Helsinki, Finland
e:d...@djcbsoftware.nl   w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] indexing mail?

2010-01-14 Thread Olly Betts
On 2010-01-15, Dirk-Jan C  Binnema wrote:
 Olly == Olly Betts o...@survex.com writes:
Olly Not a full patch, but I already posted what this code should look 
 like
Olly to handle both systems without d_type, and those which return 
 DT_UNKNOWN:

Olly http://article.gmane.org/gmane.mail.notmuch.general/1044

 static gboolean
 _set_dtype (const char* path, struct dirent *entry)

Underscore prefixed identifiers are reserved by ISO C at file-scope; using them
yourself is undefined behaviour...

   /* we only care about dirs, regular files and links */
   if (S_ISREG (statbuf.st_mode))
   entry-d_type = DT_REG;
   else if (S_ISDIR (statbuf.st_mode))
   entry-d_type = DT_DIR;
   else if (S_ISLNK (statbuf.st_mode))
   entry-d_type = DT_LNK;

This addresses the case where the FS returns DT_UNKNOWN for d_type, but doesn't
deal with the case of platforms where struct dirent has no d_type member - from
the Linux readdir man page:

  The only fields in the dirent structure that are mandated by POSIX.1 are:
  d_name[], of unspecified size, with at most NAME_MAX characters preceding
  the terminating null byte; and (as an XSI extension) d_ino.  The other fields
  are unstandardized, and not present on all systems; see NOTES below for some
  further details.

And in NOTES:

  Other than Linux, the d_type field is available mainly only on BSD systems.

Cheers,
Olly

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch