[PATCH 1/2] python: add classes to wrap all notmuch_*_t types

2011-12-02 Thread James Westby
On Fri, 02 Dec 2011 09:20:35 -0500, James Westby <jw+debian at jameswestby.net> 
wrote:
> I'll test again to make sure that I have this correct, but my tests
> yesterday certainly suggested that your patches fixed this.

Yep, segfaults a plenty dropping your second patch that go away again
when it is applied once more.

Thanks,

James


[PATCH 1/2] python: add classes to wrap all notmuch_*_t types

2011-12-02 Thread James Westby
On Fri, 02 Dec 2011 13:35:11 +0100, Justus Winter <4winter at 
informatik.uni-hamburg.de> wrote:
> Huh, strange. My patch isn't supposed to change anything, it just
> enables the ctypes package to check whether the functions from
> libnotmuch are called with the right parameters, thus preventing
> mistakes when changing the python bindings in the future.

Where .restype is set to c_void_p ctypes spots this and returns it as
int32. Where it's set to another callable (e.g. your custom types) it
actually wraps the value by passing it to the callable and returning the
result.

Then, because your custom types are pointers, it stores them in an
appropriate value, and also stops the storage being reused.

I'll test again to make sure that I have this correct, but my tests
yesterday certainly suggested that your patches fixed this.

Thanks,

James


Re: [PATCH 1/2] python: add classes to wrap all notmuch_*_t types

2011-12-02 Thread James Westby
On Fri, 02 Dec 2011 13:35:11 +0100, Justus Winter 
4win...@informatik.uni-hamburg.de wrote:
 Huh, strange. My patch isn't supposed to change anything, it just
 enables the ctypes package to check whether the functions from
 libnotmuch are called with the right parameters, thus preventing
 mistakes when changing the python bindings in the future.

Where .restype is set to c_void_p ctypes spots this and returns it as
int32. Where it's set to another callable (e.g. your custom types) it
actually wraps the value by passing it to the callable and returning the
result.

Then, because your custom types are pointers, it stores them in an
appropriate value, and also stops the storage being reused.

I'll test again to make sure that I have this correct, but my tests
yesterday certainly suggested that your patches fixed this.

Thanks,

James
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 1/2] python: add classes to wrap all notmuch_*_t types

2011-12-02 Thread James Westby
On Fri, 02 Dec 2011 09:20:35 -0500, James Westby jw+deb...@jameswestby.net 
wrote:
 I'll test again to make sure that I have this correct, but my tests
 yesterday certainly suggested that your patches fixed this.

Yep, segfaults a plenty dropping your second patch that go away again
when it is applied once more.

Thanks,

James
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] python: Store pointers as c_void_p to keep references

2011-12-01 Thread James Westby
From: James Westby <james.wes...@linaro.org>

ctypes doesn't return c_void_p return values as that, it returns them as
32-bit integers instead. This has two problems:

  1 - On 64-bit machines anything higher than the max 32-bit integer
  will overflow when passed back in to another function expecting,
  a pointer giving the wrong value.

  2 - If the value isn't stored as a pointer then the memory can be
  re-used and so the object will be corrupted.

The fix for both of these is to store the values as pointers.
---
 bindings/python/notmuch/database.py |   18 +-
 bindings/python/notmuch/message.py  |   10 +-
 bindings/python/notmuch/thread.py   |6 +++---
 3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/bindings/python/notmuch/database.py 
b/bindings/python/notmuch/database.py
index f4bc53e..e5188fb 100644
--- a/bindings/python/notmuch/database.py
+++ b/bindings/python/notmuch/database.py
@@ -152,7 +152,7 @@ class Database(object):
 if res is None:
 raise NotmuchError(
 message="Could not create the specified database")
-self._db = res
+self._db = c_void_p(res)

 def open(self, path, mode=0):
 """Opens an existing database
@@ -171,7 +171,7 @@ class Database(object):

 if res is None:
 raise NotmuchError(message="Could not open the specified database")
-self._db = res
+self._db = c_void_p(res)

 def get_path(self):
 """Returns the file path of an open database"""
@@ -297,7 +297,7 @@ class Database(object):
 dir_p = Database._get_directory(self._db, _str(path))

 # return the Directory, init it with the absolute path
-return Directory(_str(abs_dirpath), dir_p, self)
+return Directory(_str(abs_dirpath), c_void_p(dir_p), self)

 def add_message(self, filename, sync_maildir_flags=False):
 """Adds a new message to the database
@@ -466,7 +466,7 @@ class Database(object):
 tags_p = Database._get_all_tags(self._db)
 if tags_p == None:
 raise NotmuchError(STATUS.NULL_POINTER)
-return Tags(tags_p, self)
+return Tags(c_void_p(tags_p), self)

 def create_query(self, querystring):
 """Returns a :class:`Query` derived from this database
@@ -600,7 +600,7 @@ class Query(object):
 query_p = Query._create(db.db_p, _str(querystr))
 if query_p is None:
 raise NullPointerError
-self._query = query_p
+self._query = c_void_p(query_p)

 def set_sort(self, sort):
 """Set the sort order future results will be delivered in
@@ -630,7 +630,7 @@ class Query(object):

 if threads_p is None:
 raise NullPointerError
-return Threads(threads_p, self)
+return Threads(c_void_p(threads_p), self)

 def search_messages(self):
 """Filter messages according to the query and return
@@ -644,7 +644,7 @@ class Query(object):

 if msgs_p is None:
 raise NullPointerError
-return Messages(msgs_p, self)
+return Messages(c_void_p(msgs_p), self)

 def count_messages(self):
 """Estimate the number of messages matching the query
@@ -793,7 +793,7 @@ class Directory(object):
 """
 self._assert_dir_is_initialized()
 files_p = Directory._get_child_files(self._dir_p)
-return Filenames(files_p, self)
+return Filenames(c_void_p(files_p), self)

 def get_child_directories(self):
 """Gets a :class:`Filenames` iterator listing all the filenames of
@@ -804,7 +804,7 @@ class Directory(object):
 """
 self._assert_dir_is_initialized()
 files_p = Directory._get_child_directories(self._dir_p)
-return Filenames(files_p, self)
+return Filenames(c_void_p(files_p), self)

 @property
 def path(self):
diff --git a/bindings/python/notmuch/message.py 
b/bindings/python/notmuch/message.py
index 4bf90c2..672a169 100644
--- a/bindings/python/notmuch/message.py
+++ b/bindings/python/notmuch/message.py
@@ -140,7 +140,7 @@ class Messages(object):

 if tags_p == None:
 raise NotmuchError(STATUS.NULL_POINTER)
-return Tags(tags_p, self)
+return Tags(c_void_p(tags_p), self)

 def __iter__(self):
 """ Make Messages an iterator """
@@ -154,7 +154,7 @@ class Messages(object):
 self._msgs = None
 raise StopIteration

-msg = Message(Messages._get(self._msgs), self)
+msg = Message(c_void_p(Messages._get(self._msgs)), self)
 nmlib.notmuch_messages_move_to_next(self._msgs)
 return msg

@@ -350,7 +350,7 @@ class Message(object):
 if msgs_p is None:
  

python-notmuch crash with threads

2011-12-01 Thread James Westby
Hi,

I've been seeing a race with python-notmuch, where it will crash due to
pointers being invalidated when threads are used.

I've attached a script which shows the problem some of the time. It's
about the smallest script I can make, but it's hampered by the fact that
making it simpler seems to make the race less likely, so it's hard to
know when it is gone.

The typical backtrace is:

Program terminated with signal 11, Segmentation fault.
#0  0x7f7b19c34b59 in talloc_named_const () from 
/usr/lib/x86_64-linux-gnu/libtalloc.so.2
(gdb) up
#1  0x7f7b1a5f78dc in notmuch_query_search_threads (query=0x14001c70) at 
lib/query.cc:322
322 lib/query.cc: No such file or directory.
in lib/query.cc
(gdb) p *query
Cannot access memory at address 0x14001c70

Where something is invalidating the pointer between creation in
db.create_query() and calling it in query.search_threads()

I've seen other similar things when using other code in the thread.

http://talloc.samba.org/talloc/doc/html/index.html talks about the
thread-safety of talloc, and I don't think it's any of those issues
here.

Any suggestions for how to debug this further would be most welcome.

Thanks,

James

-- next part --
A non-text attachment was scrubbed...
Name: test.py
Type: text/x-python
Size: 1170 bytes
Desc: not available
URL: 



python-notmuch crash with threads

2011-12-01 Thread James Westby
Hi,

I've been seeing a race with python-notmuch, where it will crash due to
pointers being invalidated when threads are used.

I've attached a script which shows the problem some of the time. It's
about the smallest script I can make, but it's hampered by the fact that
making it simpler seems to make the race less likely, so it's hard to
know when it is gone.

The typical backtrace is:

Program terminated with signal 11, Segmentation fault.
#0  0x7f7b19c34b59 in talloc_named_const () from 
/usr/lib/x86_64-linux-gnu/libtalloc.so.2
(gdb) up
#1  0x7f7b1a5f78dc in notmuch_query_search_threads (query=0x14001c70) at 
lib/query.cc:322
322 lib/query.cc: No such file or directory.
in lib/query.cc
(gdb) p *query
Cannot access memory at address 0x14001c70

Where something is invalidating the pointer between creation in
db.create_query() and calling it in query.search_threads()

I've seen other similar things when using other code in the thread.

http://talloc.samba.org/talloc/doc/html/index.html talks about the
thread-safety of talloc, and I don't think it's any of those issues
here.

Any suggestions for how to debug this further would be most welcome.

Thanks,

James

import threading


class NotmuchThread(threading.Thread):

def __init__(self):
super(NotmuchThread, self).__init__()
self.job_waiting = threading.Condition()
self.job_queue = []

def run(self):
from notmuch.database import Database
self.db = Database()
job = None
print Aquiring lock
with self.job_waiting:
if len(self.job_queue)  1:
print Job queue empty so waiting
while True:
ready = self.job_waiting.wait(1)
if ready:
break
if len(self.job_queue)  0:
break
print Got job, releasing lock
self.search_threads(tag:inbox)

def search_threads(self, query_string):
query = self.db.create_query(query_string)
print(%X % query._query)
threads = query.search_threads()
return threads

test_thread = NotmuchThread()
test_thread.start()
with test_thread.job_waiting:
test_thread.job_queue.append()
test_thread.job_waiting.notify()
import time; time.sleep(1)
test_thread.join()
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] python: Store pointers as c_void_p to keep references

2011-12-01 Thread James Westby
On Thu,  1 Dec 2011 18:13:05 -0500, James Westby jw+deb...@jameswestby.net 
wrote:
 From: James Westby james.wes...@linaro.org
 
 ctypes doesn't return c_void_p return values as that, it returns them as
 32-bit integers instead. This has two problems:
 
   1 - On 64-bit machines anything higher than the max 32-bit integer
   will overflow when passed back in to another function expecting,
   a pointer giving the wrong value.
 
   2 - If the value isn't stored as a pointer then the memory can be
   re-used and so the object will be corrupted.
 
 The fix for both of these is to store the values as pointers.

http://osdir.com/ml/python.ctypes/2006-12/msg00048.html states that this
is expected behaviour of ctypes.

Thanks,

James
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 1/2] python: add classes to wrap all notmuch_*_t types

2011-12-01 Thread James Westby
On Thu, 01 Dec 2011 22:25:41 +0100, Sebastian Spaeth sebast...@sspaeth.de 
wrote:
 This strikes me as a rather good thing, so the patches went in.

Hah, I've just seen this, and I'm going to guess that it fixes my
problems too.

...

I've tested and it seems to work, so my patch is unneeded witht his one.

Thanks,

James
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


See only unread message in a thread ?

2010-04-14 Thread James Westby
On Wed, 14 Apr 2010 00:43:16 +0200, Xavier Maillard  wrote:
> Is it done automatically ? Or do I need to do something special
> in order to unset the unread tag ?
> 
> I see there is 'a' and 'x' when in notmuch-show but I am not sure
> I have to explicitely press on of these keys.
> 
> Currently, when in a notmuch-search buffer, I press RET to visit
> the thread and then I play with 'n' to go next message till I
> read the whole thread. Then, I press 'q' to go back to the
> notmuch-search buffer. Is this the way to do ?

That should be removing the 'unread' tag.

Therefore it sounds like you want to be using a search including
'tag:unread' to get the behaviour that you want. Obviously that might
not show you all the threads that you want.

There is a notmuch-show-next-unread-message, but it's not bound to
anything by default. That might also be useful to you.

Thanks,

James


See only unread message in a thread ?

2010-04-14 Thread James Westby
On Tue, 13 Apr 2010 19:29:29 -0400, Jameson Rollins  wrote:
> On Tue, 13 Apr 2010 23:19:37 +0100, James Westby <jw+debian at 
> jameswestby.net> wrote:
> > On Wed, 14 Apr 2010 00:11:50 +0200, Xavier Maillard  wrote:
> > > Say I have a thread with A-B-C. I visit the thread and read the
> > > whole thread. Let's say after 'notmuch new' a post has entered
> > > the thread: A-B-C-D. Is there an easy way to just have one of
> > > these behaviour:
> > > 
> > > - only show the new message (with an option to toggle display of
> > >   the old messages)
> > > - display the whole thread with the 3 read messages 'collapsed'
> > >   and only the unread message 'expanded'
> > 
> > This is the default behaviour in my experience.
> 
> I agree this is the behavior I see, but there is a caveat that may be
> the cause of confusion:
> 
> > Reading a message unsets the 'unread' tag.
> > 
> > The default search if for 'inbox' + 'unread'.
> 
> Actually, the default search is for just "tag:inbox", which will also
> display any messages in the inbox that have "tag:unread" as well.  But
> remember that if you are coming from a search for "tag:inbox", when you
> view a thread all the message with "tag:inbox" will be visible
> (uncollapsed), whether or not they have "tag:unread" not.  Just reading
> a message will not cause it to collapse when coming from a "tag:inbox"
> search.  I think this might be where the confusion is coming from.  If
> you want to view just the messages in your inbox that are unread, then
> you need to search for "tag:inbox and tag:unread", or type 'f' in
> notmuch mode to filter your inbox search with the "tag:unread" search.
> Make sense?

Yes, thanks for clarifying.

I realise now that I lied about the behaviour that I was seeing.

I use the default tag:inbox search, but archive all the mails 'a' as I
read them, using other tags to keep track of things I want to come back
to.

That means that 'inbox' behaves the way I described 'unread' to behave
for me. I see the new messages as the old ones had their 'inbox' tag
removed.

Thanks,

James


See only unread message in a thread ?

2010-04-14 Thread James Westby
On Wed, 14 Apr 2010 00:11:50 +0200, Xavier Maillard  wrote:
> Hi,
> 
> maybe I missread the "manual" but I can't find an easy way to do
> something simple in notmuch.el.
> 
> Say I have a thread with A-B-C. I visit the thread and read the
> whole thread. Let's say after 'notmuch new' a post has entered
> the thread: A-B-C-D. Is there an easy way to just have one of
> these behaviour:
> 
> - only show the new message (with an option to toggle display of
>   the old messages)
> - display the whole thread with the 3 read messages 'collapsed'
>   and only the unread message 'expanded'

This is the default behaviour in my experience.

Reading a message unsets the 'unread' tag.

The default search if for 'inbox' + 'unread'.

This lists the threads where any message within matches that criterion.

When viewing a thread from there the messages that don't match are
collapsed.

Therefore if you are remove the 'unread' tag when reading mail you will
only read the newest messages when 'notmuch new' adds a new message to
the thread if the search that took you there included the 'unread' tag.

Thanks,

James


[notmuch] [PATCH] Store thread ids for messages that we haven't seen yet

2010-04-13 Thread James Westby
On Tue, 13 Apr 2010 08:20:48 -0700, Carl Worth  wrote:
> Thanks for this patch, James! It's especially nice to have the fix come
> in with additions to the test suite as well.

Thanks for including a test suite I could add to!

> I did split up the commit so the addition to the test suite happens
> first. That way it's easy to test the test itself, (verifying that it
> fails before the fix, and then passes after the fix).

Your choice. I prefer putting them in the same commit to be more
self-documenting, and then using the capabilities of my VCS to verify
the change if i desire.

> I also added a few documentation and other cleanups as follow-on
> commits. Hopefully, they don't change the logic at all, but make things
> easier to understand.
> 
> So that's all pushed.

Great, thanks.

> Then, I started implementing support for retroactively storing
> thread_ids for non-existing messages references in already-existing
> messages. It took me perhaps too long that a change like that, (while
> useful), is too invasive for the current 0.2 release, and not essential
> for this particular feature.

This would fix up threads for all existing messages? Probably a good
thing to have, but not that important to me. In my case I can always
open the bug in my browser if I want to see the full conversation.

> So I've postponed that part at least. I hope to make a database-schema
> upgrade a key part of a release in a couple of cycles, (for this
> feature and for "list:" and "folder:").

Cool, I look forward to it.

Thanks,

James


Re: See only unread message in a thread ?

2010-04-13 Thread James Westby
On Wed, 14 Apr 2010 00:11:50 +0200, Xavier Maillard x...@gnu.org wrote:
 Hi,
 
 maybe I missread the manual but I can't find an easy way to do
 something simple in notmuch.el.
 
 Say I have a thread with A-B-C. I visit the thread and read the
 whole thread. Let's say after 'notmuch new' a post has entered
 the thread: A-B-C-D. Is there an easy way to just have one of
 these behaviour:
 
 - only show the new message (with an option to toggle display of
   the old messages)
 - display the whole thread with the 3 read messages 'collapsed'
   and only the unread message 'expanded'

This is the default behaviour in my experience.

Reading a message unsets the 'unread' tag.

The default search if for 'inbox' + 'unread'.

This lists the threads where any message within matches that criterion.

When viewing a thread from there the messages that don't match are
collapsed.

Therefore if you are remove the 'unread' tag when reading mail you will
only read the newest messages when 'notmuch new' adds a new message to
the thread if the search that took you there included the 'unread' tag.

Thanks,

James
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Plans for the 0.2 release (this week)

2010-04-08 Thread James Westby
On Wed, 07 Apr 2010 15:12:44 -0700, Carl Worth  wrote:
>   * Anything else that people want, (especially things that already
> exist and that you're already using). Support for maildir flags on
> import would be great here. I'm still waiting to see a complete
> solution I think.

id:1268515677-12692-1-git-send-email-jw+debian at jameswestby.net

I've been running with this locally since then with no apparent issues.

It will make anyone who suffers from this problem regularly pretty
happy, as they can return to the world of a thread-based mail client.

Thanks,

James


Re: Plans for the 0.2 release (this week)

2010-04-07 Thread James Westby
On Wed, 07 Apr 2010 15:12:44 -0700, Carl Worth cwo...@cworth.org wrote:
   * Anything else that people want, (especially things that already
 exist and that you're already using). Support for maildir flags on
 import would be great here. I'm still waiting to see a complete
 solution I think.

id:1268515677-12692-1-git-send-email-jw+deb...@jameswestby.net

I've been running with this locally since then with no apparent issues.

It will make anyone who suffers from this problem regularly pretty
happy, as they can return to the world of a thread-based mail client.

Thanks,

James
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH] Produce valid JSON output even if mail body is empty

2010-03-14 Thread James Westby
On Sun, 14 Mar 2010 19:19:11 +0100, Michal Sojka  wrote:
> Mails with empty body produced the following output:
>   "body": [{"id": 1, "content-type": "text/plain", "content": (null)}]
> The (null) is not valid JSON syntax.

Is this just something that can happen with the body?

I've see (null) in the emacs interface when I've done something silly
such as opening a newer notmuch db with an old client.

Should all the attributes be guarded in a similar manner to ensure valid
JSON?

Thanks,

James


[notmuch] [PATCH] Store thread ids for messages that we haven't seen yet

2010-03-13 Thread James Westby
This allows us to thread messages even when we receive them out of
order, or never receive the root.

The thread ids for messages that aren't present but are referred to are
stored as metadata in the database and then retrieved if we ever get
that message.

When determining the thread id for a message we also check for this
metadata so that we can thread descendants of a message together before
we receive it.
---
 lib/database.cc   |   78 ++--
 test/notmuch-test |   32 +++--
 2 files changed, 86 insertions(+), 24 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index c91e97c..92234ff 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -,6 +,31 @@ notmuch_database_get_directory (notmuch_database_t 
*notmuch,
 return _notmuch_directory_create (notmuch, path, );
 }

+static const char *
+_notmuch_database_generate_thread_id (notmuch_database_t *notmuch)
+{
+/* 16 bytes (+ terminator) for hexadecimal representation of
+ * a 64-bit integer. */
+static char thread_id[17];
+Xapian::WritableDatabase *db;
+
+db = static_cast  (notmuch->xapian_db);
+
+notmuch->last_thread_id++;
+
+sprintf (thread_id, "%016" PRIx64, notmuch->last_thread_id);
+
+db->set_metadata ("last_thread_id", thread_id);
+
+return thread_id;
+}
+
+static char *
+_get_metadata_thread_id_key (void *ctx, const char *message_id)
+{
+return talloc_asprintf (ctx, "thread_id_%s", message_id);
+}
+
 /* Find the thread ID to which the message with 'message_id' belongs.
  *
  * Returns NULL if no message with message ID 'message_id' is in the
@@ -1127,8 +1152,25 @@ _resolve_message_id_to_thread_id (notmuch_database_t 
*notmuch,
 const char *ret = NULL;

 message = notmuch_database_find_message (notmuch, message_id);
-if (message == NULL)
-   goto DONE;
+/* If we haven't seen that message yet then check if we have already
+ * generated a dummy id for it and stored it in the metadata.
+ * If not then we generate a new thread id.
+ * This ensures that we can thread messages even when we haven't received
+ * the root (yet?)
+ */
+if (message == NULL) {
+Xapian::WritableDatabase *db = static_cast  (notmuch->xapian_db);
+char * metadata_key = _get_metadata_thread_id_key (ctx, message_id);
+string thread_id = notmuch->xapian_db->get_metadata(metadata_key);
+if (thread_id.empty()) {
+ret = _notmuch_database_generate_thread_id(notmuch);
+db->set_metadata(metadata_key, ret);
+} else {
+ret = thread_id.c_str();
+}
+talloc_free (metadata_key);
+goto DONE;
+}

 ret = talloc_steal (ctx, notmuch_message_get_thread_id (message));

@@ -1295,25 +1337,6 @@ _notmuch_database_link_message_to_children 
(notmuch_database_t *notmuch,
 return ret;
 }

-static const char *
-_notmuch_database_generate_thread_id (notmuch_database_t *notmuch)
-{
-/* 16 bytes (+ terminator) for hexadecimal representation of
- * a 64-bit integer. */
-static char thread_id[17];
-Xapian::WritableDatabase *db;
-
-db = static_cast  (notmuch->xapian_db);
-
-notmuch->last_thread_id++;
-
-sprintf (thread_id, "%016" PRIx64, notmuch->last_thread_id);
-
-db->set_metadata ("last_thread_id", thread_id);
-
-return thread_id;
-}
-
 /* Given a (mostly empty) 'message' and its corresponding
  * 'message_file' link it to existing threads in the database.
  *
@@ -1337,6 +1360,19 @@ _notmuch_database_link_message (notmuch_database_t 
*notmuch,
 {
 notmuch_status_t status;
 const char *thread_id = NULL;
+char *metadata_key = _get_metadata_thread_id_key (message,
+notmuch_message_get_message_id (message));
+/* Check if we have already seen related messages to this one.
+ * If we have then use the thread_id that we stored at that time.
+ */
+string stored_id = notmuch->xapian_db->get_metadata (metadata_key);
+if (!stored_id.empty()) {
+Xapian::WritableDatabase *db = static_cast  (notmuch->xapian_db);
+db->set_metadata (metadata_key, "");
+thread_id = stored_id.c_str();
+_notmuch_message_add_term (message, "thread", thread_id);
+}
+talloc_free (metadata_key);

 status = _notmuch_database_link_message_to_parents (notmuch, message,
message_file,
diff --git a/test/notmuch-test b/test/notmuch-test
index 7bc53ec..9264fb0 100755
--- a/test/notmuch-test
+++ b/test/notmuch-test
@@ -64,6 +64,10 @@ increment_mtime ()
 #  Additional values for email headers. If these are not provided
 #  then the relevant headers will simply not appear in the
 #  message.
+#
+#  '[id]='
+#
+#  Controls the message-id of the created message.
 gen_msg_cnt=0
 gen_msg_filename=""
 gen_msg_id=""
@@ -73,9 +77,14 @@ generate_message ()
 local -A template="($@)"
 local 

[notmuch] [PATCH] Store thread ids for messages that we haven't seen yet

2010-03-13 Thread James Westby
This allows us to thread messages even when we receive them out of
order, or never receive the root.

The thread ids for messages that aren't present but are referred to are
stored as metadata in the database and then retrieved if we ever get
that message.

When determining the thread id for a message we also check for this
metadata so that we can thread descendants of a message together before
we receive it.
---
 lib/database.cc   |   78 ++--
 test/notmuch-test |   32 +++--
 2 files changed, 86 insertions(+), 24 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index c91e97c..92234ff 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -,6 +,31 @@ notmuch_database_get_directory (notmuch_database_t 
*notmuch,
 return _notmuch_directory_create (notmuch, path, status);
 }
 
+static const char *
+_notmuch_database_generate_thread_id (notmuch_database_t *notmuch)
+{
+/* 16 bytes (+ terminator) for hexadecimal representation of
+ * a 64-bit integer. */
+static char thread_id[17];
+Xapian::WritableDatabase *db;
+
+db = static_cast Xapian::WritableDatabase * (notmuch-xapian_db);
+
+notmuch-last_thread_id++;
+
+sprintf (thread_id, %016 PRIx64, notmuch-last_thread_id);
+
+db-set_metadata (last_thread_id, thread_id);
+
+return thread_id;
+}
+
+static char *
+_get_metadata_thread_id_key (void *ctx, const char *message_id)
+{
+return talloc_asprintf (ctx, thread_id_%s, message_id);
+}
+
 /* Find the thread ID to which the message with 'message_id' belongs.
  *
  * Returns NULL if no message with message ID 'message_id' is in the
@@ -1127,8 +1152,25 @@ _resolve_message_id_to_thread_id (notmuch_database_t 
*notmuch,
 const char *ret = NULL;
 
 message = notmuch_database_find_message (notmuch, message_id);
-if (message == NULL)
-   goto DONE;
+/* If we haven't seen that message yet then check if we have already
+ * generated a dummy id for it and stored it in the metadata.
+ * If not then we generate a new thread id.
+ * This ensures that we can thread messages even when we haven't received
+ * the root (yet?)
+ */
+if (message == NULL) {
+Xapian::WritableDatabase *db = static_cast Xapian::WritableDatabase 
* (notmuch-xapian_db);
+char * metadata_key = _get_metadata_thread_id_key (ctx, message_id);
+string thread_id = notmuch-xapian_db-get_metadata(metadata_key);
+if (thread_id.empty()) {
+ret = _notmuch_database_generate_thread_id(notmuch);
+db-set_metadata(metadata_key, ret);
+} else {
+ret = thread_id.c_str();
+}
+talloc_free (metadata_key);
+goto DONE;
+}
 
 ret = talloc_steal (ctx, notmuch_message_get_thread_id (message));
 
@@ -1295,25 +1337,6 @@ _notmuch_database_link_message_to_children 
(notmuch_database_t *notmuch,
 return ret;
 }
 
-static const char *
-_notmuch_database_generate_thread_id (notmuch_database_t *notmuch)
-{
-/* 16 bytes (+ terminator) for hexadecimal representation of
- * a 64-bit integer. */
-static char thread_id[17];
-Xapian::WritableDatabase *db;
-
-db = static_cast Xapian::WritableDatabase * (notmuch-xapian_db);
-
-notmuch-last_thread_id++;
-
-sprintf (thread_id, %016 PRIx64, notmuch-last_thread_id);
-
-db-set_metadata (last_thread_id, thread_id);
-
-return thread_id;
-}
-
 /* Given a (mostly empty) 'message' and its corresponding
  * 'message_file' link it to existing threads in the database.
  *
@@ -1337,6 +1360,19 @@ _notmuch_database_link_message (notmuch_database_t 
*notmuch,
 {
 notmuch_status_t status;
 const char *thread_id = NULL;
+char *metadata_key = _get_metadata_thread_id_key (message,
+notmuch_message_get_message_id (message));
+/* Check if we have already seen related messages to this one.
+ * If we have then use the thread_id that we stored at that time.
+ */
+string stored_id = notmuch-xapian_db-get_metadata (metadata_key);
+if (!stored_id.empty()) {
+Xapian::WritableDatabase *db = static_cast Xapian::WritableDatabase 
* (notmuch-xapian_db);
+db-set_metadata (metadata_key, );
+thread_id = stored_id.c_str();
+_notmuch_message_add_term (message, thread, thread_id);
+}
+talloc_free (metadata_key);
 
 status = _notmuch_database_link_message_to_parents (notmuch, message,
message_file,
diff --git a/test/notmuch-test b/test/notmuch-test
index 7bc53ec..9264fb0 100755
--- a/test/notmuch-test
+++ b/test/notmuch-test
@@ -64,6 +64,10 @@ increment_mtime ()
 #  Additional values for email headers. If these are not provided
 #  then the relevant headers will simply not appear in the
 #  message.
+#
+#  '[id]=message-id'
+#
+#  Controls the message-id of the created message.
 gen_msg_cnt=0
 gen_msg_filename=
 

[notmuch] an other ready-to-use store option for notmuch : CouchDB

2010-03-02 Thread James Westby
On Tue, 2 Mar 2010 16:43:10 +0100, martin f krafft  
wrote:
> also sprach Paul R  [2010.03.02.1626 +0100]:
> > CouchDB databases can be replicated and synced in both directions.
> > Conflicts are lazily handled.
> 
> What does this mean?

Couch has deterministic conflict resolution between the two sides, so
all instances will get the same result without needing a master.

You can then ask for the current conflicts to do more in-depth
resolution if needed, including prompting the user.

Thanks,

James


Re: [notmuch] an other ready-to-use store option for notmuch : CouchDB

2010-03-02 Thread James Westby
On Tue, 2 Mar 2010 16:43:10 +0100, martin f krafft madd...@madduck.net wrote:
 also sprach Paul R paul.r...@gmail.com [2010.03.02.1626 +0100]:
  CouchDB databases can be replicated and synced in both directions.
  Conflicts are lazily handled.
 
 What does this mean?

Couch has deterministic conflict resolution between the two sides, so
all instances will get the same result without needing a master.

You can then ask for the current conflicts to do more in-depth
resolution if needed, including prompting the user.

Thanks,

James
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] What's so great about notmuch?

2010-02-27 Thread James Westby
On Fri, 26 Feb 2010 12:08:49 -0800, Carl Worth  wrote:
> It seems clear that I'll have some opportunities to present notmuch to
> various audiences at varying levels of formality. Since notmuch is
> already a bigger project than me, I'd love to get some ideas from others
> about what you think are the "big ideas" in notmuch.
> 
> So here are a few different phrasings of what's basically the same
> question. And I'd love to hear some brief opinions on any one of these,
> (or similar topics):
> 
>What's your favorite thing about notmuch?

That it takes as long to download my mail from my server as it does to
then process my inbox down to zero. (Though it doesn't yet allow me to
complete all the tasks within in that time unfortunately.)

>What about notmuch makes it distinctive compared to other email
>programs?

That the sucky things about it are bugs that will be fixed soon enough,
rather than architectual problems that will never get fixed (with
possibly one exception noted below.)

>If someone were to implement a new email system from scratch, but
>capturing the "ideas" of notmuch, what would it have to have?

  * Thread based

  * Tagging

  * Pervasive search

  * Speed

  * An amusing name


The biggest un-good thing about notmuch right now for me is that it is
effectively single-machine (though not hostile about it like sup). I
want local mail for offline use (like right now,) but have more than one
machine. It would be perfect if doing this didn't require notmuch on
every client for it to work, with graceful degredation for those without
it, though I would be perfectly happy with an android port.

Thanks,

James




[notmuch] [PATCH] Look for text/html content types ignoring case

2010-02-19 Thread James Westby
On Fri, 19 Feb 2010 08:32:42 +, David Edmondson  wrote:
> How about this:

Looks reasonable to me, thanks.


James


[notmuch] [PATCH] Look for text/html content types ignoring case

2010-02-16 Thread James Westby
Some people send html parts as text/HTML or similar, so do a
case-sensitive comparison when checking for html parts.
---
 notmuch.el |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/notmuch.el b/notmuch.el
index 0f4ea10..b69c334 100644
--- a/notmuch.el
+++ b/notmuch.el
@@ -736,7 +736,7 @@ is what to put on the button."
 (setq mime-type (car (split-string (buffer-substring 
 (match-beginning 1) (match-end 
1))

-  (if (equal mime-type "text/html")
+  (if (equal (downcase mime-type) "text/html")
   (let ((filename (notmuch-show-get-filename)))
 (with-temp-buffer
   (insert-file-contents filename nil nil nil t)
-- 
1.6.6.1



[notmuch] [PATCH] Look for text/html content types ignoring case

2010-02-16 Thread James Westby
Some people send html parts as text/HTML or similar, so do a
case-sensitive comparison when checking for html parts.
---
 notmuch.el |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/notmuch.el b/notmuch.el
index 0f4ea10..b69c334 100644
--- a/notmuch.el
+++ b/notmuch.el
@@ -736,7 +736,7 @@ is what to put on the button.
 (setq mime-type (car (split-string (buffer-substring 
 (match-beginning 1) (match-end 
1))
 
-  (if (equal mime-type text/html)
+  (if (equal (downcase mime-type) text/html)
   (let ((filename (notmuch-show-get-filename)))
 (with-temp-buffer
   (insert-file-contents filename nil nil nil t)
-- 
1.6.6.1

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] tag dir proposal [was: Re: Git as notmuch object store]

2010-01-28 Thread James Westby
On Thu, 28 Jan 2010 18:12:52 +1300, martin f krafft  
wrote:
> also sprach Jameson Rollins  [2010.01.27.0603 
> +1300]:
> > > This is getting involved. 
> > > 
> > > Maybe I'm missing something in this thread; but, why couldn't these 
> > > complex and
> > > context-sensitive decisions be delegated to sub-processes? ala git hooks?
> > 
> > I think this idea is completely independent of anything having to do
> > with using git as a mail store.  That's why I was trying to separate it
> > into a separate thread.
> 
> I think he meant "notmuch hooks like you have hooks for Git too",
> e.g. thread:755741d13573c7642761d2a175cb146d

Are you trying to use thread: such that it could be passed to notmuch
show to see the conversation?

That's not going to work so well if so.

Thanks,

James


Re: [notmuch] tag dir proposal [was: Re: Git as notmuch object store]

2010-01-27 Thread James Westby
On Thu, 28 Jan 2010 18:12:52 +1300, martin f krafft madd...@madduck.net wrote:
 also sprach Jameson Rollins jroll...@finestructure.net [2010.01.27.0603 
 +1300]:
   This is getting involved. 
   
   Maybe I'm missing something in this thread; but, why couldn't these 
   complex and
   context-sensitive decisions be delegated to sub-processes? ala git hooks?
  
  I think this idea is completely independent of anything having to do
  with using git as a mail store.  That's why I was trying to separate it
  into a separate thread.
 
 I think he meant notmuch hooks like you have hooks for Git too,
 e.g. thread:755741d13573c7642761d2a175cb146d

Are you trying to use thread: such that it could be passed to notmuch
show to see the conversation?

That's not going to work so well if so.

Thanks,

James
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] tag dir proposal [was: Re: Git as notmuch object store]

2010-01-27 Thread James Westby
On Thu, 28 Jan 2010 18:34:21 +1300, martin f krafft madd...@madduck.net wrote:
 also sprach James Westby jw+deb...@jameswestby.net [2010.01.28.1828 +1300]:
  Are you trying to use thread: such that it could be passed to
  notmuch show to see the conversation?
  
  That's not going to work so well if so.
 
 Why not? Works fine for me with the vim plugin...

lib/message.cc:560

static void   
thread_id_generate (thread_id_t *thread_id)   
{
static int seeded = 0;
FILE *dev_random;
uint32_t value;   
char *s;
int i;
  
if (! seeded) {
dev_random = fopen (/dev/random, r);  
if (dev_random == NULL) {
srand (time (NULL));  
} else {
fread ((void *) value, sizeof (value), 1, dev_random);   
srand (value);
fclose (dev_random); 
}
seeded = 1;
} 

s = thread_id-str;
for (i = 0; i  NOTMUCH_THREAD_ID_DIGITS; i += 8) {   
value = rand ();
sprintf (s, %08x, value);   
s += 8;   
}
}

so it works fine for you, however I have no idea which thread you are
talking about.

Thanks,

James
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] indexing encrypted messages (was: OpenPGP support)

2010-01-08 Thread James Westby
On Fri, 8 Jan 2010 15:56:10 +1300, martin f krafft  
wrote:
> also sprach Jameson Graef Rollins  
> [2009.11.26.1901 +1300]:
> > I would really like to start using notmuch with emacs beyond just
> > testing, but I really need to be able to handle/read/send mail with
> > PGP/MIME encoded attachments.  Do folks have any suggestions on how to
> > handle this?  Is there a separate emacs mode that people use for
> > signing/verifying/{de,en}crypting mail buffers, or is this something
> > that is going to have to be integrated into the notmuch mode?  I guess
> > the notmuch-show mode at least will need to do some verifying and
> > decrypting.
> 
> How about indexing GPG-encrypted messages?

I think the difficulty will be interactivity. If notmuch-new can
potentially block watiting for a passphrase then it's not going to be
much use for non-interactive use, and whether someone can respond to a
GPG prompt is harder to determine that isatty().

Configuration may be a possible way around that, but looking at other
things such as opportunistic indexing could be good. For instance,
it could be the job of the UIs to decrypt content, and there could be a
nomuch function which takes a message id and decrypted content and
indexes it in to the DB. That means it's under the UI's control, where
the decryption UI should be, gets you indexing of encrypted content.

That would leave an open question over whether future notmuch show
invocations would return the plaintext or ciphertext. If it is the
latter then it requires decrypting every time you want to view it, but
it does mean that there is less information leakage (you could find out
whether an encrypted message contained a particular term, but not read
the whole message directly).

Thanks,

James


[notmuch] [PATCH] Store documents for message-ids we haven't seen

2009-12-20 Thread James Westby
When scanning the maildir there can be earlier messages
in the thread that we haven't seen, either due to mail
delays, or because they weren't sent. This can break the
threading.

Therefore we store stub documents for every message id
we see, so that we can link them in to the threads. If
we see the message later then we will replace the document
with the full information.
---

  Here's the third part of the patch, that actually stores
  the stubs and links the threads. It's fairly simple,
  though we may want to store a little more in the stubs.

  One case that isn't handled is when we don't find the thread
  id of the parent, but then find the message itself. I believe
  this case shouldn't happen, but you never know.

 lib/database.cc   |   25 +
 lib/message.cc|9 +
 lib/notmuch-private.h |   10 ++
 3 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 64f29b9..6ee8068 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -782,8 +782,33 @@ _notmuch_database_link_message_to_parents 
(notmuch_database_t *notmuch,
 parent_message_id);

if (parent_thread_id == NULL) {
+   notmuch_private_status_t private_status;
+   notmuch_message_t *new_parent_message;
+   thread_id_t parent_thread_id_t;
+
+   // We have yet to see the referenced message, generate a thread id
+   // for it if needed and store a dummy message for the parent. If we
+   // find the mail later we will replace the dummy.
_notmuch_message_add_term (message, "reference",
   parent_message_id);
+   if (*thread_id == NULL) {
+   thread_id_generate (_thread_id_t);
+   *thread_id = parent_thread_id_t.str;
+   }
+   new_parent_message = _notmuch_message_create_for_message_id 
(notmuch,
+   parent_message_id, _status);
+   if (private_status) {
+   if (private_status == 
NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+   // expected
+   _notmuch_message_add_term (new_parent_message, 
"thread", *thread_id);
+   _notmuch_message_sync (new_parent_message);
+   } else {
+   ret = COERCE_STATUS (private_status,
+"Cannot create parent message");
+   goto DONE;
+   }
+   }
+   _notmuch_message_add_term (message, "thread", *thread_id);
} else {
if (*thread_id == NULL) {
*thread_id = talloc_strdup (message, parent_thread_id);
diff --git a/lib/message.cc b/lib/message.cc
index 7129d59..dff02c4 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -42,13 +42,6 @@ struct _notmuch_message {
 Xapian::Document doc;
 };

-/* "128 bits of thread-id ought to be enough for anybody" */
-#define NOTMUCH_THREAD_ID_BITS  128
-#define NOTMUCH_THREAD_ID_DIGITS (NOTMUCH_THREAD_ID_BITS / 4)
-typedef struct _thread_id {
-char str[NOTMUCH_THREAD_ID_DIGITS + 1];
-} thread_id_t;
-
 /* We end up having to call the destructor explicitly because we had
  * to use "placement new" in order to initialize C++ objects within a
  * block that we allocated with talloc. So C++ is making talloc
@@ -551,7 +544,7 @@ _notmuch_message_set_date (notmuch_message_t *message,
Xapian::sortable_serialise (time_value));
 }

-static void
+void
 thread_id_generate (thread_id_t *thread_id)
 {
 static int seeded = 0;
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index cf65fd9..cf75b4a 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -141,6 +141,13 @@ typedef enum _notmuch_private_status {
  : \
  (notmuch_status_t) private_status)

+/* "128 bits of thread-id ought to be enough for anybody" */
+#define NOTMUCH_THREAD_ID_BITS  128
+#define NOTMUCH_THREAD_ID_DIGITS (NOTMUCH_THREAD_ID_BITS / 4)
+typedef struct _thread_id {
+char str[NOTMUCH_THREAD_ID_DIGITS + 1];
+} thread_id_t;
+
 /* database.cc */

 /* Lookup a prefix value by name.
@@ -347,6 +354,9 @@ void
 _notmuch_message_add_reply (notmuch_message_t *message,
notmuch_message_node_t *reply);

+void
+thread_id_generate (thread_id_t *thread_id);
+
 /* sha1.c */

 char *
-- 
1.6.3.3



[notmuch] [PATCH] Solaris doesn't have 'struct dirent::d_type'

2009-12-20 Thread James Westby
From: Tomas Carnecky <t...@dbservice.com>

Use stat(2) instead.

Signed-off-by: Tomas Carnecky 
Signed-off-by: James Westby <jw+debian at jameswestby.net>
---

  The original patch duplicated asprintf and stat calls, rearraging
  the code means we don't need to.

  I have a concern about the duplicated stats in is_maildir, but they
  are not so easy to save. I ran a quick timing test (3931 files), dropping
  caches before each set:

master:
  real  2m3.545s
  real  1m34.571s
  real  1m36.005s

original patch:
  real  2m18.114s
  real  1m34.843s
  real  1m36.317s

revised patch:
  real  2m5.890s
  real  1m36.387s
  real  1m36.453s

  This shoes there is little impact of the code, but given that it is
  around one percent we may want to make it conditional on platform
  and save the extra stat calls.

  Thanks,

  James

 notmuch-new.c |   46 ++
 1 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/notmuch-new.c b/notmuch-new.c
index 9d20616..c6f4963 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -90,12 +90,18 @@ static int ino_cmp(const struct dirent **a, const struct 
dirent **b)
  * Return 1 if the directory looks like a Maildir and 0 otherwise.
  */
 static int
-is_maildir (struct dirent **entries, int count)
+is_maildir (const char *path, struct dirent **entries, int count)
 {
 int i, found = 0;

 for (i = 0; i < count; i++) {
-   if (entries[i]->d_type != DT_DIR) continue;
+   char pbuf[PATH_MAX];
+snprintf(pbuf, PATH_MAX, "%s/%s", path, entries[i]->d_name);
+
+   struct stat buf;
+   if (stat(pbuf, ) == -1 || !S_ISDIR(buf.st_mode))
+   continue;
+
if (strcmp(entries[i]->d_name, "new") == 0 ||
strcmp(entries[i]->d_name, "cur") == 0 ||
strcmp(entries[i]->d_name, "tmp") == 0)
@@ -178,24 +184,6 @@ add_files_recursive (notmuch_database_t *notmuch,
/* If this directory hasn't been modified since the last
 * add_files, then we only need to look further for
 * sub-directories. */
-   if (path_mtime <= path_dbtime && entry->d_type == DT_REG)
-   continue;
-
-   /* Ignore special directories to avoid infinite recursion.
-* Also ignore the .notmuch directory.
-*/
-   /* XXX: Eventually we'll want more sophistication to let the
-* user specify files to be ignored. */
-   if (strcmp (entry->d_name, ".") == 0 ||
-   strcmp (entry->d_name, "..") == 0 ||
-   (entry->d_type == DT_DIR &&
-(strcmp (entry->d_name, "tmp") == 0) &&
-is_maildir (namelist, num_entries)) ||
-   strcmp (entry->d_name, ".notmuch") ==0)
-   {
-   continue;
-   }
-
next = talloc_asprintf (notmuch, "%s/%s", path, entry->d_name);

if (stat (next, st)) {
@@ -216,6 +204,24 @@ add_files_recursive (notmuch_database_t *notmuch,
goto DONE;
}

+   if (path_mtime <= path_dbtime && S_ISREG(st->st_mode))
+   continue;
+
+   /* Ignore special directories to avoid infinite recursion.
+* Also ignore the .notmuch directory.
+*/
+   /* XXX: Eventually we'll want more sophistication to let the
+* user specify files to be ignored. */
+   if (strcmp (entry->d_name, ".") == 0 ||
+   strcmp (entry->d_name, "..") == 0 ||
+   (S_ISDIR(st->st_mode) &&
+(strcmp (entry->d_name, "tmp") == 0) &&
+is_maildir (path, namelist, num_entries)) ||
+   strcmp (entry->d_name, ".notmuch") ==0)
+   {
+   continue;
+   }
+
if (S_ISREG (st->st_mode)) {
/* If the file hasn't been modified since the last
 * add_files, then we need not look at it. */
-- 
1.6.3.3



[notmuch] [PATCH] Store documents for message-ids we haven't seen

2009-12-20 Thread James Westby
When scanning the maildir there can be earlier messages
in the thread that we haven't seen, either due to mail
delays, or because they weren't sent. This can break the
threading.

Therefore we store stub documents for every message id
we see, so that we can link them in to the threads. If
we see the message later then we will replace the document
with the full information.
---

  Here's the third part of the patch, that actually stores
  the stubs and links the threads. It's fairly simple,
  though we may want to store a little more in the stubs.

  One case that isn't handled is when we don't find the thread
  id of the parent, but then find the message itself. I believe
  this case shouldn't happen, but you never know.

 lib/database.cc   |   25 +
 lib/message.cc|9 +
 lib/notmuch-private.h |   10 ++
 3 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 64f29b9..6ee8068 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -782,8 +782,33 @@ _notmuch_database_link_message_to_parents 
(notmuch_database_t *notmuch,
 parent_message_id);
 
if (parent_thread_id == NULL) {
+   notmuch_private_status_t private_status;
+   notmuch_message_t *new_parent_message;
+   thread_id_t parent_thread_id_t;
+
+   // We have yet to see the referenced message, generate a thread id
+   // for it if needed and store a dummy message for the parent. If we
+   // find the mail later we will replace the dummy.
_notmuch_message_add_term (message, reference,
   parent_message_id);
+   if (*thread_id == NULL) {
+   thread_id_generate (parent_thread_id_t);
+   *thread_id = parent_thread_id_t.str;
+   }
+   new_parent_message = _notmuch_message_create_for_message_id 
(notmuch,
+   parent_message_id, private_status);
+   if (private_status) {
+   if (private_status == 
NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+   // expected
+   _notmuch_message_add_term (new_parent_message, 
thread, *thread_id);
+   _notmuch_message_sync (new_parent_message);
+   } else {
+   ret = COERCE_STATUS (private_status,
+Cannot create parent message);
+   goto DONE;
+   }
+   }
+   _notmuch_message_add_term (message, thread, *thread_id);
} else {
if (*thread_id == NULL) {
*thread_id = talloc_strdup (message, parent_thread_id);
diff --git a/lib/message.cc b/lib/message.cc
index 7129d59..dff02c4 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -42,13 +42,6 @@ struct _notmuch_message {
 Xapian::Document doc;
 };
 
-/* 128 bits of thread-id ought to be enough for anybody */
-#define NOTMUCH_THREAD_ID_BITS  128
-#define NOTMUCH_THREAD_ID_DIGITS (NOTMUCH_THREAD_ID_BITS / 4)
-typedef struct _thread_id {
-char str[NOTMUCH_THREAD_ID_DIGITS + 1];
-} thread_id_t;
-
 /* We end up having to call the destructor explicitly because we had
  * to use placement new in order to initialize C++ objects within a
  * block that we allocated with talloc. So C++ is making talloc
@@ -551,7 +544,7 @@ _notmuch_message_set_date (notmuch_message_t *message,
Xapian::sortable_serialise (time_value));
 }
 
-static void
+void
 thread_id_generate (thread_id_t *thread_id)
 {
 static int seeded = 0;
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index cf65fd9..cf75b4a 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -141,6 +141,13 @@ typedef enum _notmuch_private_status {
  : \
  (notmuch_status_t) private_status)
 
+/* 128 bits of thread-id ought to be enough for anybody */
+#define NOTMUCH_THREAD_ID_BITS  128
+#define NOTMUCH_THREAD_ID_DIGITS (NOTMUCH_THREAD_ID_BITS / 4)
+typedef struct _thread_id {
+char str[NOTMUCH_THREAD_ID_DIGITS + 1];
+} thread_id_t;
+
 /* database.cc */
 
 /* Lookup a prefix value by name.
@@ -347,6 +354,9 @@ void
 _notmuch_message_add_reply (notmuch_message_t *message,
notmuch_message_node_t *reply);
 
+void
+thread_id_generate (thread_id_t *thread_id);
+
 /* sha1.c */
 
 char *
-- 
1.6.3.3

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH] Fix-up some outdated comments.

2009-12-19 Thread James Westby
---
 lib/message.cc |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index cc32741..7129d59 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -391,7 +391,7 @@ notmuch_message_get_replies (notmuch_message_t *message)
  * multiple filenames for email messages with identical message IDs.
  *
  * This change will not be reflected in the database until the next
- * call to _notmuch_message_set_sync. */
+ * call to _notmuch_message_sync. */
 void
 _notmuch_message_set_filename (notmuch_message_t *message,
   const char *filename)
@@ -622,7 +622,7 @@ _notmuch_message_close (notmuch_message_t *message)
  * names to prefix values.
  *
  * This change will not be reflected in the database until the next
- * call to _notmuch_message_set_sync. */
+ * call to _notmuch_message_sync. */
 notmuch_private_status_t
 _notmuch_message_add_term (notmuch_message_t *message,
   const char *prefix_name,
@@ -679,7 +679,7 @@ _notmuch_message_gen_terms (notmuch_message_t *message,
  * names to prefix values.
  *
  * This change will not be reflected in the database until the next
- * call to _notmuch_message_set_sync. */
+ * call to _notmuch_message_sync. */
 notmuch_private_status_t
 _notmuch_message_remove_term (notmuch_message_t *message,
  const char *prefix_name,
-- 
1.6.3.3



[notmuch] [PATCH] Store the size of the file for each message

2009-12-19 Thread James Westby
On Fri, 18 Dec 2009 16:57:16 -0800, Carl Worth  wrote:
> You can, actually. Just set the NOTMUCH_CONFIG environment variable to
> your alternate configuration file. (And yes, we're missing any mention
> of this in our documentation.)

Sweet. Where would be the best place to document it? Just in the
man page?

Thanks,

James


[notmuch] [PATCH] Reindex larger files that duplicate ids we have

2009-12-19 Thread James Westby
When we see a message where we already have the file
id stored, check if the size is larger. If it is then
re-index and set the file size and name to be the
new message.
---

  Here's the (quite simple) patch to implement indexing the
  largest copy of each mail that we have.

  Does the re-indexing replace the old terms? In the case
  where you had a collision with different text this could
  make a search return mails that don't contain that text.
  I don't think it's a big issue though, even if that is the
  case.

  Thanks,

  James

 lib/database.cc   |4 +++-
 lib/index.cc  |   27 +++
 lib/message.cc|   31 ++-
 lib/notmuch-private.h |   13 +
 lib/notmuch.h |5 +++--
 5 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index d834d94..64f29b9 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1000,7 +1000,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (ret)
goto DONE;
} else {
-   ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+   ret = _notmuch_message_possibly_reindex (message, filename, size);
+   if (!ret)
+   ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
goto DONE;
}

diff --git a/lib/index.cc b/lib/index.cc
index 125fa6c..14c3268 100644
--- a/lib/index.cc
+++ b/lib/index.cc
@@ -312,3 +312,30 @@ _notmuch_message_index_file (notmuch_message_t *message,

 return ret;
 }
+
+notmuch_status_t
+_notmuch_message_possibly_reindex (notmuch_message_t *message,
+const char *filename,
+const off_t size)
+{
+off_t realsize = size;
+off_t stored_size;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
+
+ret = _notmuch_message_size_on_disk (message, filename, );
+if (ret)
+goto DONE;
+stored_size = _notmuch_message_get_filesize (message);
+if (realsize > stored_size) {
+   ret = _notmuch_message_index_file (message, filename);
+   if (ret)
+   goto DONE;
+   ret = _notmuch_message_set_filesize (message, filename, realsize);
+   _notmuch_message_set_filename (message, filename);
+   _notmuch_message_sync (message);
+}
+
+  DONE:
+return ret;
+
+}
diff --git a/lib/message.cc b/lib/message.cc
index 2bfc5ed..cc32741 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -427,23 +427,38 @@ _notmuch_message_set_filename (notmuch_message_t *message,
 }

 notmuch_status_t
-_notmuch_message_set_filesize (notmuch_message_t *message,
+_notmuch_message_size_on_disk (notmuch_message_t *message,
   const char *filename,
-  const off_t size)
+  off_t *size)
 {
 struct stat st;
-off_t realsize = size;
 notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;

-if (realsize < 0) {
+if (*size < 0) {
if (stat (filename, )) {
ret = NOTMUCH_STATUS_FILE_ERROR;
goto DONE;
} else {
-   realsize = st.st_size;
+   *size = st.st_size;
}
 }

+  DONE:
+return ret;
+}
+
+notmuch_status_t
+_notmuch_message_set_filesize (notmuch_message_t *message,
+  const char *filename,
+  const off_t size)
+{
+off_t realsize = size;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
+
+ret = _notmuch_message_size_on_disk (message, filename, );
+if (ret)
+goto DONE;
+
 message->doc.add_value (NOTMUCH_VALUE_FILESIZE,
 Xapian::sortable_serialise (realsize));

@@ -451,6 +466,12 @@ _notmuch_message_set_filesize (notmuch_message_t *message,
 return ret;
 }

+off_t
+_notmuch_message_get_filesize (notmuch_message_t *message)
+{
+return Xapian::sortable_unserialise (message->doc.get_value 
(NOTMUCH_VALUE_FILESIZE));
+}
+
 const char *
 notmuch_message_get_filename (notmuch_message_t *message)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 1ba3055..cf65fd9 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -199,6 +199,14 @@ _notmuch_message_set_filesize (notmuch_message_t *message,
   const char *filename,
   const off_t size);

+off_t
+_notmuch_message_get_filesize (notmuch_message_t *message);
+
+notmuch_status_t
+_notmuch_message_size_on_disk (notmuch_message_t *message,
+  const char *filename,
+  off_t *size);
+
 void
 _notmuch_message_ensure_thread_id (notmuch_message_t *message);

@@ -218,6 +226,11 @@ notmuch_status_t
 _notmuch_message_index_file (notmuch_message_t *message,
 const char *filename);

+notmuch_status_t
+_notmuch_message_possibly_reindex (notmuch_message_t *message,
+const char *filename,
+ 

[notmuch] [PATCH v2] Store the size of the file for each message

2009-12-19 Thread James Westby
When indexing a message store the filesize along with it so that
when we store all the filenames for a message-id we can know if
any of them have different content cheaply.

The value stored is defined to be the largest filesize of any
of the files for that message.

This changes the API for efficiency reasons. The size is often
known to the caller, and so we save a second stat by asking them
to provide it. If they don't know it they can pass -1 and the
stat will be done for them.

We store the filesize such that we can query a range. Thus it
would be possible to query "filesize:0..100" if you somehow
knew the raw message was less that 100 bytes.
---

  With new, improved, working, filesize:.. search.

 lib/database.cc   |7 +++
 lib/message.cc|   25 +
 lib/notmuch-private.h |8 +++-
 lib/notmuch.h |5 +
 notmuch-new.c |2 +-
 5 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index b6c4d07..d834d94 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -463,6 +463,8 @@ notmuch_database_open (const char *path,
 struct stat st;
 int err;
 unsigned int i;
+Xapian::NumberValueRangeProcessor *filesize_proc = new 
Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_FILESIZE,
+"filesize:", true);

 if (asprintf (_path, "%s/%s", path, ".notmuch") == -1) {
notmuch_path = NULL;
@@ -508,6 +510,7 @@ notmuch_database_open (const char *path,
notmuch->query_parser->set_stemmer (Xapian::Stem ("english"));
notmuch->query_parser->set_stemming_strategy 
(Xapian::QueryParser::STEM_SOME);
notmuch->query_parser->add_valuerangeprocessor 
(notmuch->value_range_processor);
+   notmuch->query_parser->add_valuerangeprocessor (filesize_proc);

for (i = 0; i < ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
prefix_t *prefix = _PREFIX_EXTERNAL[i];
@@ -889,6 +892,7 @@ _notmuch_database_link_message (notmuch_database_t *notmuch,
 notmuch_status_t
 notmuch_database_add_message (notmuch_database_t *notmuch,
  const char *filename,
+ const off_t size,
  notmuch_message_t **message_ret)
 {
 notmuch_message_file_t *message_file;
@@ -992,6 +996,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
_notmuch_message_set_filename (message, filename);
_notmuch_message_add_term (message, "type", "mail");
+   ret = _notmuch_message_set_filesize (message, filename, size);
+   if (ret)
+   goto DONE;
} else {
ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
goto DONE;
diff --git a/lib/message.cc b/lib/message.cc
index 49519f1..2bfc5ed 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -426,6 +426,31 @@ _notmuch_message_set_filename (notmuch_message_t *message,
 message->doc.set_data (s);
 }

+notmuch_status_t
+_notmuch_message_set_filesize (notmuch_message_t *message,
+  const char *filename,
+  const off_t size)
+{
+struct stat st;
+off_t realsize = size;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
+
+if (realsize < 0) {
+   if (stat (filename, )) {
+   ret = NOTMUCH_STATUS_FILE_ERROR;
+   goto DONE;
+   } else {
+   realsize = st.st_size;
+   }
+}
+
+message->doc.add_value (NOTMUCH_VALUE_FILESIZE,
+Xapian::sortable_serialise (realsize));
+
+  DONE:
+return ret;
+}
+
 const char *
 notmuch_message_get_filename (notmuch_message_t *message)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 116f63d..1ba3055 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -100,7 +100,8 @@ _internal_error (const char *format, ...) PRINTF_ATTRIBUTE 
(1, 2);

 typedef enum {
 NOTMUCH_VALUE_TIMESTAMP = 0,
-NOTMUCH_VALUE_MESSAGE_ID
+NOTMUCH_VALUE_MESSAGE_ID,
+NOTMUCH_VALUE_FILESIZE
 } notmuch_value_t;

 /* Xapian (with flint backend) complains if we provide a term longer
@@ -193,6 +194,11 @@ void
 _notmuch_message_set_filename (notmuch_message_t *message,
   const char *filename);

+notmuch_status_t
+_notmuch_message_set_filesize (notmuch_message_t *message,
+  const char *filename,
+  const off_t size);
+
 void
 _notmuch_message_ensure_thread_id (notmuch_message_t *message);

diff --git a/lib/notmuch.h b/lib/notmuch.h
index 60834fb..5d0d224 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -32,6 +32,7 @@
 NOTMUCH_BEGIN_DECLS

 #include 
+#include 

 #ifndef FALSE
 #define FALSE 0
@@ -241,6 +242,9 @@ notmuch_database_get_timestamp (notmuch_database_t 
*database,
  * notmuch database will reference the filename, and will not copy the
 

[notmuch] [PATCH] Store the size of the file for each message

2009-12-19 Thread James Westby
On Fri, 18 Dec 2009 14:29:21 -0800, Carl Worth  wrote:
> On Fri, 18 Dec 2009 21:21:03 +0000, James Westby <jw+debian at 
> jameswestby.net> wrote:
> Yes, a value makes sense here and should make the value easy to
> retrieve.

Excellent.

> I usually use a little tool I wrote called xapian-dump. It currently
> exists only in the git history of notmuch. Look at commit:
> 
>   22691064666c03c5e76bc787395bfe586929f4cc
> 
> or so.

Thanks, I found delve, which at least showed that something was
being stored. It's in the xapian-tools package, and

   delve -V2 

prints out the filesize value for each document.

It would be great if we could specify an alternative configuration
file for testing so that I can set up a small maildir and test
against that.

> If the file size is just an integer, then you shouldn't need a custom
> ValueRangeProcessor. One of the existing processors in Xapian should
> work fine.

Correct, I hadn't read the documentation closely enough. After fixing
that and doing some testing I have this working now. Patch incoming.

Thanks,

James


[notmuch] Missing messages breaking threads

2009-12-18 Thread James Westby
On Fri, 18 Dec 2009 12:52:58 -0800, Carl Worth  wrote:
> On Fri, 18 Dec 2009 19:53:13 +0000, James Westby <jw+debian at 
> jameswestby.net> wrote:
> Oh, I was assuming you wouldn't index any text. The UI can add "missing
> message" for a document with no filename, for example.

Works for me.

> > So, to summarise, I should first look at storing filesizes, then
> > the collision code to make it index further when the filesize grows,
> > and then finally the code to add documents for missing messages?
> 
> Some of the code areas to be touched will be changing soon, (at least as
> far as when filenames appear and disappear). Hopefully I'll have
> something posted for that sooner rather than later to avoid having to
> redo too much work.

That would be great. I'm learning all the code anyway, so there's not
a whole lot of knowledge being thrown away.

I've just sent an initial cut at the fist step.

> > The only thing I am unclear on is how to handle existing databases?
> > Do we have any concept of versioning? Or should I just assume that
> > filesize: may not be in the document and act appropriately?
> 
> My current, outstanding patch is going to be the first trigger for a
> "flag day" where we'll all need to rewrite our databases.
> 
> We don't have any concept of versioning yet, but it would obviously be
> easy to have a new version document with an increasing integer.
> 
> But even with my current patch I'm considering doing a graceful upgrade
> of the database in-place rather than making the user do something like a
> dump, delete, rebuild, restore. That would give a much better experience
> than "Your database is out-of-date, please rebuild it", so we'll see if
> I pursue that in the end.

That sounds nice, I'd certainly prefer this sort of thing as it evolves.

Thanks,

James


[notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread James Westby
When indexing a message store the filesize along with it so that
when we store all the filenames for a message-id we can know if
any of them have different content cheaply.

The value stored is defined to be the largest filesize of any
of the files for that message.

This changes the API for efficiency reasons. The size is often
known to the caller, and so we save a second stat by asking them
to provide it. If they don't know it they can pass -1 and the
stat will be done for them.

We store the filesize such that we can query a range. Thus it
would be possible to query "filesize:0..100" if you somehow
knew the raw message was less that 100 bytes.
---

  Here's the first part, storing the filesize. I'm using
  add_value so that we can make it sortable, is that valid
  for retrieving it as well?

  The only thing I'm not sure about is if it works. Is there
  a way to inspect a document to see the values that are
  stored? Doing a search isn't working, so I imagine I made
  a mistake.

  Thanks,

  James

 lib/database.cc   |   17 +
 lib/message.cc|   25 +
 lib/notmuch-private.h |8 +++-
 lib/notmuch.h |5 +
 notmuch-new.c |2 +-
 5 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index b6c4d07..0ec77cd 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -454,6 +454,17 @@ notmuch_database_create (const char *path)
 return notmuch;
 }

+struct FilesizeValueRangeProcessor : public Xapian::ValueRangeProcessor {
+FilesizeValueRangeProcessor() {}
+
+Xapian::valueno operator()(std::string , std::string &) {
+if (begin.substr(0, 9) != "filesize:")
+return Xapian::BAD_VALUENO;
+begin.erase(0, 9);
+return NOTMUCH_VALUE_FILESIZE;
+}
+};
+
 notmuch_database_t *
 notmuch_database_open (const char *path,
   notmuch_database_mode_t mode)
@@ -463,6 +474,7 @@ notmuch_database_open (const char *path,
 struct stat st;
 int err;
 unsigned int i;
+FilesizeValueRangeProcessor filesize_proc;

 if (asprintf (_path, "%s/%s", path, ".notmuch") == -1) {
notmuch_path = NULL;
@@ -508,6 +520,7 @@ notmuch_database_open (const char *path,
notmuch->query_parser->set_stemmer (Xapian::Stem ("english"));
notmuch->query_parser->set_stemming_strategy 
(Xapian::QueryParser::STEM_SOME);
notmuch->query_parser->add_valuerangeprocessor 
(notmuch->value_range_processor);
+   notmuch->query_parser->add_valuerangeprocessor (_proc);

for (i = 0; i < ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
prefix_t *prefix = _PREFIX_EXTERNAL[i];
@@ -889,6 +902,7 @@ _notmuch_database_link_message (notmuch_database_t *notmuch,
 notmuch_status_t
 notmuch_database_add_message (notmuch_database_t *notmuch,
  const char *filename,
+ const off_t size,
  notmuch_message_t **message_ret)
 {
 notmuch_message_file_t *message_file;
@@ -992,6 +1006,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
_notmuch_message_set_filename (message, filename);
_notmuch_message_add_term (message, "type", "mail");
+   ret = _notmuch_message_set_filesize (message, filename, size);
+   if (ret)
+   goto DONE;
} else {
ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
goto DONE;
diff --git a/lib/message.cc b/lib/message.cc
index 49519f1..2bfc5ed 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -426,6 +426,31 @@ _notmuch_message_set_filename (notmuch_message_t *message,
 message->doc.set_data (s);
 }

+notmuch_status_t
+_notmuch_message_set_filesize (notmuch_message_t *message,
+  const char *filename,
+  const off_t size)
+{
+struct stat st;
+off_t realsize = size;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
+
+if (realsize < 0) {
+   if (stat (filename, )) {
+   ret = NOTMUCH_STATUS_FILE_ERROR;
+   goto DONE;
+   } else {
+   realsize = st.st_size;
+   }
+}
+
+message->doc.add_value (NOTMUCH_VALUE_FILESIZE,
+Xapian::sortable_serialise (realsize));
+
+  DONE:
+return ret;
+}
+
 const char *
 notmuch_message_get_filename (notmuch_message_t *message)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 116f63d..1ba3055 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -100,7 +100,8 @@ _internal_error (const char *format, ...) PRINTF_ATTRIBUTE 
(1, 2);

 typedef enum {
 NOTMUCH_VALUE_TIMESTAMP = 0,
-NOTMUCH_VALUE_MESSAGE_ID
+NOTMUCH_VALUE_MESSAGE_ID,
+NOTMUCH_VALUE_FILESIZE
 } notmuch_value_t;

 /* Xapian (with flint backend) complains if we provide a term longer
@@ -193,6 

[notmuch] Missing messages breaking threads

2009-12-18 Thread James Westby
On Fri, 18 Dec 2009 11:41:18 -0800, Carl Worth  wrote:
> On Fri, 18 Dec 2009 19:02:21 +0000, James Westby <jw+debian at 
> jameswestby.net> wrote:
> > Therefore I'd like to fix this. The obvious way is to
> > introduce documents in to the db for each id we see, and
> > threading should then naturally work better.
> 
> That sounds like a fine idea.

Good, at least I'm not totally off the map.

> > The only issue I see with doing this is with mail delays.
> > Once we do this we will sometimes receive a message that
> > already has a dummy document. What happens currently with
> > message-id collisions?
> 
> The current message-ID collision logic is pretty brain-dead. It just
> says "Oh, I've seen a file with this message before, so I'll skip this
> additional file".
> 
> But I'm just putting the finishing touches on a patch that instead does:
> 
>   Oh, and here's an additional filename for that message ID. Add
>   that too, please.
> 
> Beyond that, all we would need to do as well is to also index the new
> content. I don't want to do useless re-indexing when files just get
> renamed. So maybe all we need to do is to save the filesize of the
> last-indexed file for a document and then when we encounter a file with
> the same message ID and a larger file size, then index it as well?

I would say different file size, but I imagine larger is the majority
of interesting cases.

> That would even take care of providing the opportunity to index
> additional mailing-list-added content for messages also sent directly
> via CC.
> 
> The file-size heuristic wouldn't be perfect for these other cases. I
> guess we save a list of sha-1 sums for indexed files or so, (assuming
> that's cheaper than just re-indexing---before the Xapian Defect 250 fix
> I'm sure it is, but after I'm not sure---we maybe should just always
> re-index---but I think I have seen the TermGenerator appear in profiles
> of indexing runs.)

I'm not sure this is needed too much, but would obviously be
correct.

On Xapian 250, I have a very slow spinning disk, and it was hitting
me hard, making processing my inbox far too slow. I built Xapian SVN
with the patch from the bug and it is now lightning fast, so
consider this another endorsement. I also tried the supplemental
patch and it showed no further improvement for notmuch tag.

> >   * When we get a message-id conflict check for dummy:True
> > and replace the document if it is there.
> > 
> > How does this sound?
> 
> That sounds fine. It's the same as what I propose above with
> "filesize:0" instead of "dummy:true".

That works. However, we would want the old content to go away
in these cases wouldn't we.

Or do we not index whatever dummy text we add? Or do we not
even put it in? Or not even show it at all? I was just thinking
of having "Missing messages..." showing up as the start of
the thread, but maybe it's no needed.

> > There could be an issue with synthesising too many threads
> > and then ending up having to try and put a message in two
> > threads? I see there is code for merging threads, would that
> > handle this?
> 
> It should, yes.
> 
> The current logic is that a message can only appear in a single
> thread. So if a message has children or parents with distinct thread IDs
> then those threads are merged.
> 
> I can imagine some strange cross-posting scenario where one could argue
> that the merging shouldn't happen, but I'm not sure we want to try to
> respect that.

Fair enough.

So, to summarise, I should first look at storing filesizes, then
the collision code to make it index further when the filesize grows,
and then finally the code to add documents for missing messages?

The only thing I am unclear on is how to handle existing databases?
Do we have any concept of versioning? Or should I just assume that
filesize: may not be in the document and act appropriately?

Thanks,

James



[notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread James Westby
When indexing a message store the filesize along with it so that
when we store all the filenames for a message-id we can know if
any of them have different content cheaply.

The value stored is defined to be the largest filesize of any
of the files for that message.

This changes the API for efficiency reasons. The size is often
known to the caller, and so we save a second stat by asking them
to provide it. If they don't know it they can pass -1 and the
stat will be done for them.

We store the filesize such that we can query a range. Thus it
would be possible to query filesize:0..100 if you somehow
knew the raw message was less that 100 bytes.
---

  Here's the first part, storing the filesize. I'm using
  add_value so that we can make it sortable, is that valid
  for retrieving it as well?

  The only thing I'm not sure about is if it works. Is there
  a way to inspect a document to see the values that are
  stored? Doing a search isn't working, so I imagine I made
  a mistake.

  Thanks,

  James

 lib/database.cc   |   17 +
 lib/message.cc|   25 +
 lib/notmuch-private.h |8 +++-
 lib/notmuch.h |5 +
 notmuch-new.c |2 +-
 5 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index b6c4d07..0ec77cd 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -454,6 +454,17 @@ notmuch_database_create (const char *path)
 return notmuch;
 }
 
+struct FilesizeValueRangeProcessor : public Xapian::ValueRangeProcessor {
+FilesizeValueRangeProcessor() {}
+
+Xapian::valueno operator()(std::string begin, std::string ) {
+if (begin.substr(0, 9) != filesize:)
+return Xapian::BAD_VALUENO;
+begin.erase(0, 9);
+return NOTMUCH_VALUE_FILESIZE;
+}
+};
+
 notmuch_database_t *
 notmuch_database_open (const char *path,
   notmuch_database_mode_t mode)
@@ -463,6 +474,7 @@ notmuch_database_open (const char *path,
 struct stat st;
 int err;
 unsigned int i;
+FilesizeValueRangeProcessor filesize_proc;
 
 if (asprintf (notmuch_path, %s/%s, path, .notmuch) == -1) {
notmuch_path = NULL;
@@ -508,6 +520,7 @@ notmuch_database_open (const char *path,
notmuch-query_parser-set_stemmer (Xapian::Stem (english));
notmuch-query_parser-set_stemming_strategy 
(Xapian::QueryParser::STEM_SOME);
notmuch-query_parser-add_valuerangeprocessor 
(notmuch-value_range_processor);
+   notmuch-query_parser-add_valuerangeprocessor (filesize_proc);
 
for (i = 0; i  ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
prefix_t *prefix = BOOLEAN_PREFIX_EXTERNAL[i];
@@ -889,6 +902,7 @@ _notmuch_database_link_message (notmuch_database_t *notmuch,
 notmuch_status_t
 notmuch_database_add_message (notmuch_database_t *notmuch,
  const char *filename,
+ const off_t size,
  notmuch_message_t **message_ret)
 {
 notmuch_message_file_t *message_file;
@@ -992,6 +1006,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
_notmuch_message_set_filename (message, filename);
_notmuch_message_add_term (message, type, mail);
+   ret = _notmuch_message_set_filesize (message, filename, size);
+   if (ret)
+   goto DONE;
} else {
ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
goto DONE;
diff --git a/lib/message.cc b/lib/message.cc
index 49519f1..2bfc5ed 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -426,6 +426,31 @@ _notmuch_message_set_filename (notmuch_message_t *message,
 message-doc.set_data (s);
 }
 
+notmuch_status_t
+_notmuch_message_set_filesize (notmuch_message_t *message,
+  const char *filename,
+  const off_t size)
+{
+struct stat st;
+off_t realsize = size;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
+
+if (realsize  0) {
+   if (stat (filename, st)) {
+   ret = NOTMUCH_STATUS_FILE_ERROR;
+   goto DONE;
+   } else {
+   realsize = st.st_size;
+   }
+}
+
+message-doc.add_value (NOTMUCH_VALUE_FILESIZE,
+Xapian::sortable_serialise (realsize));
+
+  DONE:
+return ret;
+}
+
 const char *
 notmuch_message_get_filename (notmuch_message_t *message)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 116f63d..1ba3055 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -100,7 +100,8 @@ _internal_error (const char *format, ...) PRINTF_ATTRIBUTE 
(1, 2);
 
 typedef enum {
 NOTMUCH_VALUE_TIMESTAMP = 0,
-NOTMUCH_VALUE_MESSAGE_ID
+NOTMUCH_VALUE_MESSAGE_ID,
+NOTMUCH_VALUE_FILESIZE
 } notmuch_value_t;
 
 /* Xapian (with flint backend) complains if we provide a term longer
@@ 

Re: [notmuch] Missing messages breaking threads

2009-12-18 Thread James Westby
On Fri, 18 Dec 2009 12:52:58 -0800, Carl Worth cwo...@cworth.org wrote:
 On Fri, 18 Dec 2009 19:53:13 +, James Westby jw+deb...@jameswestby.net 
 wrote:
 Oh, I was assuming you wouldn't index any text. The UI can add missing
 message for a document with no filename, for example.

Works for me.

  So, to summarise, I should first look at storing filesizes, then
  the collision code to make it index further when the filesize grows,
  and then finally the code to add documents for missing messages?
 
 Some of the code areas to be touched will be changing soon, (at least as
 far as when filenames appear and disappear). Hopefully I'll have
 something posted for that sooner rather than later to avoid having to
 redo too much work.

That would be great. I'm learning all the code anyway, so there's not
a whole lot of knowledge being thrown away.

I've just sent an initial cut at the fist step.

  The only thing I am unclear on is how to handle existing databases?
  Do we have any concept of versioning? Or should I just assume that
  filesize: may not be in the document and act appropriately?
 
 My current, outstanding patch is going to be the first trigger for a
 flag day where we'll all need to rewrite our databases.
 
 We don't have any concept of versioning yet, but it would obviously be
 easy to have a new version document with an increasing integer.
 
 But even with my current patch I'm considering doing a graceful upgrade
 of the database in-place rather than making the user do something like a
 dump, delete, rebuild, restore. That would give a much better experience
 than Your database is out-of-date, please rebuild it, so we'll see if
 I pursue that in the end.

That sounds nice, I'd certainly prefer this sort of thing as it evolves.

Thanks,

James
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH] Reindex larger files that duplicate ids we have

2009-12-18 Thread James Westby
When we see a message where we already have the file
id stored, check if the size is larger. If it is then
re-index and set the file size and name to be the
new message.
---

  Here's the (quite simple) patch to implement indexing the
  largest copy of each mail that we have.

  Does the re-indexing replace the old terms? In the case
  where you had a collision with different text this could
  make a search return mails that don't contain that text.
  I don't think it's a big issue though, even if that is the
  case.

  Thanks,

  James

 lib/database.cc   |4 +++-
 lib/index.cc  |   27 +++
 lib/message.cc|   31 ++-
 lib/notmuch-private.h |   13 +
 lib/notmuch.h |5 +++--
 5 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index d834d94..64f29b9 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1000,7 +1000,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (ret)
goto DONE;
} else {
-   ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+   ret = _notmuch_message_possibly_reindex (message, filename, size);
+   if (!ret)
+   ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
goto DONE;
}
 
diff --git a/lib/index.cc b/lib/index.cc
index 125fa6c..14c3268 100644
--- a/lib/index.cc
+++ b/lib/index.cc
@@ -312,3 +312,30 @@ _notmuch_message_index_file (notmuch_message_t *message,
 
 return ret;
 }
+
+notmuch_status_t
+_notmuch_message_possibly_reindex (notmuch_message_t *message,
+const char *filename,
+const off_t size)
+{
+off_t realsize = size;
+off_t stored_size;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
+
+ret = _notmuch_message_size_on_disk (message, filename, realsize);
+if (ret)
+goto DONE;
+stored_size = _notmuch_message_get_filesize (message);
+if (realsize  stored_size) {
+   ret = _notmuch_message_index_file (message, filename);
+   if (ret)
+   goto DONE;
+   ret = _notmuch_message_set_filesize (message, filename, realsize);
+   _notmuch_message_set_filename (message, filename);
+   _notmuch_message_sync (message);
+}
+
+  DONE:
+return ret;
+
+}
diff --git a/lib/message.cc b/lib/message.cc
index 2bfc5ed..cc32741 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -427,23 +427,38 @@ _notmuch_message_set_filename (notmuch_message_t *message,
 }
 
 notmuch_status_t
-_notmuch_message_set_filesize (notmuch_message_t *message,
+_notmuch_message_size_on_disk (notmuch_message_t *message,
   const char *filename,
-  const off_t size)
+  off_t *size)
 {
 struct stat st;
-off_t realsize = size;
 notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
 
-if (realsize  0) {
+if (*size  0) {
if (stat (filename, st)) {
ret = NOTMUCH_STATUS_FILE_ERROR;
goto DONE;
} else {
-   realsize = st.st_size;
+   *size = st.st_size;
}
 }
 
+  DONE:
+return ret;
+}
+
+notmuch_status_t
+_notmuch_message_set_filesize (notmuch_message_t *message,
+  const char *filename,
+  const off_t size)
+{
+off_t realsize = size;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
+
+ret = _notmuch_message_size_on_disk (message, filename, realsize);
+if (ret)
+goto DONE;
+
 message-doc.add_value (NOTMUCH_VALUE_FILESIZE,
 Xapian::sortable_serialise (realsize));
 
@@ -451,6 +466,12 @@ _notmuch_message_set_filesize (notmuch_message_t *message,
 return ret;
 }
 
+off_t
+_notmuch_message_get_filesize (notmuch_message_t *message)
+{
+return Xapian::sortable_unserialise (message-doc.get_value 
(NOTMUCH_VALUE_FILESIZE));
+}
+
 const char *
 notmuch_message_get_filename (notmuch_message_t *message)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 1ba3055..cf65fd9 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -199,6 +199,14 @@ _notmuch_message_set_filesize (notmuch_message_t *message,
   const char *filename,
   const off_t size);
 
+off_t
+_notmuch_message_get_filesize (notmuch_message_t *message);
+
+notmuch_status_t
+_notmuch_message_size_on_disk (notmuch_message_t *message,
+  const char *filename,
+  off_t *size);
+
 void
 _notmuch_message_ensure_thread_id (notmuch_message_t *message);
 
@@ -218,6 +226,11 @@ notmuch_status_t
 _notmuch_message_index_file (notmuch_message_t *message,
 const char *filename);
 
+notmuch_status_t
+_notmuch_message_possibly_reindex (notmuch_message_t *message,
+const 

Re: [notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread James Westby
On Fri, 18 Dec 2009 16:57:16 -0800, Carl Worth cwo...@cworth.org wrote:
 You can, actually. Just set the NOTMUCH_CONFIG environment variable to
 your alternate configuration file. (And yes, we're missing any mention
 of this in our documentation.)

Sweet. Where would be the best place to document it? Just in the
man page?

Thanks,

James
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH] Fix-up some outdated comments.

2009-12-18 Thread James Westby
---
 lib/message.cc |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index cc32741..7129d59 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -391,7 +391,7 @@ notmuch_message_get_replies (notmuch_message_t *message)
  * multiple filenames for email messages with identical message IDs.
  *
  * This change will not be reflected in the database until the next
- * call to _notmuch_message_set_sync. */
+ * call to _notmuch_message_sync. */
 void
 _notmuch_message_set_filename (notmuch_message_t *message,
   const char *filename)
@@ -622,7 +622,7 @@ _notmuch_message_close (notmuch_message_t *message)
  * names to prefix values.
  *
  * This change will not be reflected in the database until the next
- * call to _notmuch_message_set_sync. */
+ * call to _notmuch_message_sync. */
 notmuch_private_status_t
 _notmuch_message_add_term (notmuch_message_t *message,
   const char *prefix_name,
@@ -679,7 +679,7 @@ _notmuch_message_gen_terms (notmuch_message_t *message,
  * names to prefix values.
  *
  * This change will not be reflected in the database until the next
- * call to _notmuch_message_set_sync. */
+ * call to _notmuch_message_sync. */
 notmuch_private_status_t
 _notmuch_message_remove_term (notmuch_message_t *message,
  const char *prefix_name,
-- 
1.6.3.3

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch