Re: [python] get all messages of a thread

2011-06-02 Thread Sebastian Spaeth
On Wed, 1 Jun 2011 15:35:35 +1000, Brian May wrote:
 Oh, I see, for your code, there is a implied call to __len__, and the
 __len__ function is completely broken for the reasons described in the
 documentation:

It seems to have been a bad idea to implement __len__ at all for the
Messsages() construct in the python bindings, and I wonder if I should
remove it.

On the other hand, it seems that list(Messages()) implicitely calls
len(), or so it seems from the error that we get when trying to list() a
messages object.

An alternative is to implement len() as a call to count_messages() which
has for me so far always returned the correct number of messages without
using up the iterator. However, the xapian docs explicitely state that
it does not guarantee that the count will be correct, so len() might
return a wrong message size (potentially).

What would be the best way to solve this (besides fixing the C api to
allow to reset the iterator ;-) ?)

I could implement a custom .as_list() function that returns the
Messages() object as a list that is guaranteed to be stable, by copying
out the Message() objects into a list.

Sebastian


pgpw0PXWEka8y.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [python] get all messages of a thread

2011-06-02 Thread Brian May
On 2 June 2011 17:05, Sebastian Spaeth sebast...@sspaeth.de wrote:

 What would be the best way to solve this (besides fixing the C api to
 allow to reset the iterator ;-) ?)


 I am not really familiar with the code. So am I correct in making the
following assumptions?

* It is not easy to fix the C api to reset the iterator (what about
repeating the search?)

* The only accurate way to get the number of messages is to iterate through
every search result and count them?

If so, then len(...) I think might be very slow if there are a large number
of elements.

Maybe it might be easier/better to implement object.__nonzero__(self)
 instead of the object.__len__(self) method?

http://docs.python.org/reference/datamodel.html

object.__nonzero__(self)
Called to implement truth value testing and the built-in operation bool();
should return False or True, or their integer equivalents 0 or 1. When this
method is not defined, __len__() is called, if it is defined, and the object
is considered true if its result is nonzero. If a class defines neither
__len__() nor __nonzero__(), all its instances are considered true.

object.__len__(self)
Called to implement the built-in function len(). Should return the length of
the object, an integer = 0. Also, an object that doesn’t define a
__nonzero__() method and whose __len__() method returns zero is considered
to be false in a Boolean context.

-- 
Brian May br...@microcomaustralia.com.au
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [python] get all messages of a thread

2011-06-02 Thread Sebastian Spaeth
On Thu, 2 Jun 2011 19:43:29 +1000, Brian May wrote:
 On 2 June 2011 17:05, Sebastian Spaeth sebast...@sspaeth.de wrote:
 
  What would be the best way to solve this (besides fixing the C api to
  allow to reset the iterator ;-) ?)

 * It is not easy to fix the C api to reset the iterator (what about
 repeating the search?)

I am not sure about the difficulty of that, I am not a C-kind of
guy. Repeating the search would be easy but potentially gives you
different results since the db could have changed since then.
 
 * The only accurate way to get the number of messages is to iterate through
 every search result and count them?

There is count_messages() which wraps notmuch_query_count_messages which
invokes some very quick xapian function. But the xapian docs explicitely
state that it's xapians best guess and not guaranteed to be the real
number of messages (although it always was in my attempts). And you
don't want len() to return an approximation of the iterator length

 If so, then len(...) I think might be very slow if there are a large number
 of elements.
 
 Maybe it might be easier/better to implement object.__nonzero__(self)
  instead of the object.__len__(self) method?
 
 http://docs.python.org/reference/datamodel.html
 
 object.__nonzero__(self)
 Called to implement truth value testing and the built-in operation bool();
 should return False or True, or their integer equivalents 0 or 1. When this
 method is not defined, __len__() is called, if it is defined, and the object
 is considered true if its result is nonzero. If a class defines neither
 __len__() nor __nonzero__(), all its instances are considered true.

Interesting, did not know about this one. I guess that would solve the
example with the:

t = query.search_threads()

if t:
  for thread in t:
print thread

Actually the if t: is no longer needed, I just tried a query returning
no Threads and 

for thread in threads:
  pass

works just fine with an empty Threads() object.

I made the change, and implemented __nonzero__ and removed the len()
method. It just doesn't make sense on 1-time iterators. (I documented
the change in the API docs). Sorry if this breaks existing code.

list(Threads()) works just fine too, it just took a while to create a
list of 13k Thread() objects on this laptop. (and list() will of course
not return until it is finished).


Sebastian


pgp2HrsCIo0F1.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [python] get all messages of a thread

2011-06-02 Thread Sebastian Spaeth
On Thu, 02 Jun 2011 16:20:14 +0200, Sebastian Spaeth wrote:

 I made the change, and implemented __nonzero__ and removed the len()
 method. It just doesn't make sense on 1-time iterators. (I documented
 the change in the API docs). Sorry if this breaks existing code.

FYI

OK, I just pushed a change that adds the __nonzero__ function thus
making:

t=Threads()
if t:
  len(t)

work, but I did not pull the len() function out. The explicit if t
test is however not needed for

t = q.search_threads()
for thread in t:
  print thread

works just fine with empty results.

Sebastian


pgpr2siKpn6fF.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [python] get all messages of a thread

2011-06-02 Thread Austin Clements
On Thu, Jun 2, 2011 at 10:20 AM, Sebastian Spaeth sebast...@sspaeth.de wrote:
 On Thu, 2 Jun 2011 19:43:29 +1000, Brian May wrote:
 On 2 June 2011 17:05, Sebastian Spaeth sebast...@sspaeth.de wrote:

  What would be the best way to solve this (besides fixing the C api to
  allow to reset the iterator ;-) ?)

 * It is not easy to fix the C api to reset the iterator (what about
 repeating the search?)

 I am not sure about the difficulty of that, I am not a C-kind of
 guy. Repeating the search would be easy but potentially gives you
 different results since the db could have changed since then.

Not too hard.  Here's an utterly untested patch that implements
iterator resetting for notmuch_messages_t iterators.  It *should* be
much more efficient than performing the query again, but if you use
it, I'd love to know if that's actually true.

This may not be useful if __len__ is gone, unless you really want to
turn Messages/Threads into iterators rather than generators (as I've
pointed out before, there is absolutely nothing unusual or un-Pythonic
about how Messages/Threads works right now [well, except for the
presence of __len__ in a generator, I suppose]).

diff --git a/lib/messages.c b/lib/messages.c
index 7bcd1ab..085691c 100644
--- a/lib/messages.c
+++ b/lib/messages.c
@@ -80,7 +80,8 @@ _notmuch_messages_create (notmuch_message_list_t *list)
return NULL;

 messages-is_of_list_type = TRUE;
-messages-iterator = list-head;
+messages-head = list-head;
+notmuch_messages_reset (messages);

 return messages;
 }
@@ -137,6 +138,15 @@ notmuch_messages_move_to_next (notmuch_messages_t
*messages)
 }

 void
+notmuch_messages_reset (notmuch_messages_t *messages)
+{
+if (! messages-is_of_list_type)
+   return _notmuch_mset_messages_reset (messages);
+
+messages-iterator = messages-head;
+}
+
+void
 notmuch_messages_destroy (notmuch_messages_t *messages)
 {
 talloc_free (messages);
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 02e24ee..805d60c 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -413,6 +413,7 @@ typedef struct _notmuch_message_list {
  */
 struct visible _notmuch_messages {
 notmuch_bool_t is_of_list_type;
+notmuch_message_node_t *head;
 notmuch_message_node_t *iterator;
 };

@@ -441,6 +442,9 @@ _notmuch_mset_messages_get (notmuch_messages_t *messages);
 void
 _notmuch_mset_messages_move_to_next (notmuch_messages_t *messages);

+void
+_notmuch_mset_messages_reset (notmuch_messages_t *messages);
+
 notmuch_bool_t
 _notmuch_doc_id_set_contains (notmuch_doc_id_set_t *doc_ids,
   unsigned int doc_id);
diff --git a/lib/notmuch.h b/lib/notmuch.h
index 9cdcec0..044cfaa 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -734,6 +734,15 @@ notmuch_messages_get (notmuch_messages_t *messages);
 void
 notmuch_messages_move_to_next (notmuch_messages_t *messages);

+/* Reset the 'messages' iterator back to the first message.
+ *
+ * For iterators returned from notmuch_query_search_messages, this is
+ * both more efficient than performing the query a second time and
+ * guaranteed to result in the same messages as the first iteration.
+ */
+void
+notmuch_messages_reset (notmuch_messages_t *messages);
+
 /* Destroy a notmuch_messages_t object.
  *
  * It's not strictly necessary to call this function. All memory from
diff --git a/lib/query.cc b/lib/query.cc
index 6f02b04..1e75be0 100644
--- a/lib/query.cc
+++ b/lib/query.cc
@@ -32,6 +32,7 @@ struct _notmuch_query {
 typedef struct _notmuch_mset_messages {
 notmuch_messages_t base;
 notmuch_database_t *notmuch;
+Xapian::MSet mset;
 Xapian::MSetIterator iterator;
 Xapian::MSetIterator iterator_end;
 } notmuch_mset_messages_t;
@@ -128,6 +129,7 @@ notmuch_query_search_messages (notmuch_query_t *query)
messages-base.is_of_list_type = FALSE;
messages-base.iterator = NULL;
messages-notmuch = notmuch;
+   new (messages-mset) Xapian::MSet ();
new (messages-iterator) Xapian::MSetIterator ();
new (messages-iterator_end) Xapian::MSetIterator ();

@@ -181,8 +183,8 @@ notmuch_query_search_messages (notmuch_query_t *query)

mset = enquire.get_mset (0, notmuch-xapian_db-get_doccount ());

-   messages-iterator = mset.begin ();
-   messages-iterator_end = mset.end ();
+   messages-mset = mset;
+   _notmuch_mset_messages_reset (messages-base);

return messages-base;

@@ -257,6 +259,17 @@ _notmuch_mset_messages_move_to_next
(notmuch_messages_t *messages)
 mset_messages-iterator++;
 }

+void
+_notmuch_mset_messages_reset (notmuch_messages_t *messages)
+{
+notmuch_mset_messages_t *mset_messages;
+
+mset_messages = (notmuch_mset_messages_t *) messages;
+
+mset_messages-iterator = mset_messages-mset.begin ();
+mset_messages-iterator_end = mset_messages-mset.end ();
+}
+
 static notmuch_bool_t
 _notmuch_doc_id_set_init (void *ctx,
  

Re: [python] get all messages of a thread

2011-05-31 Thread Carl Worth
On Sat, 28 May 2011 14:18:05 +0100, Patrick Totzke 
patricktot...@googlemail.com wrote:
 It seems that nobody needed this before. Even in bindings/python/notmuch.py
 only Threads.get_toplevel_messages() gets called, and then a (undocumented)
 Messages.print_messages is used (cf line 639, in show)
 
 any suggestions?

Looks like a bug in the bindings to me.

 I would rather not call the notmuch binary and parse its output..

Of course not!

Python folks, does someone have a quick fix here?

-Carl

-- 
carl.d.wo...@intel.com


pgpn9dFXL50nI.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [python] get all messages of a thread

2011-05-31 Thread Brian May
On 28 May 2011 23:18, Patrick Totzke patricktot...@googlemail.com wrote:

if r: #because we cant iterate on NoneType


I don't understand why, but this line sets r._msgs to None. So it crashes,
because it has no message ids to look for.

If you change it to

if r is not None:

... then it works for me.

Oh, I see, for your code, there is a implied call to __len__, and the
__len__ function is completely broken for the reasons described in the
documentation:

  .. note:: As this iterates over the messages, we will not be able to=
   iterate over them again! So this will fail::

 #THIS FAILS
 msgs = Database().create_query('').search_message()
 if len(msgs)  0:  #this 'exhausts' msgs
 # next line raises
NotmuchError(STATUS.NOT_INITIALIZED)!!!
 for msg in msgs: print msg

   Most of the time, using the
   :meth:`Query.count_messages` is therefore more
   appropriate (and much faster). While not guaranteeing
   that it will return the exact same number than len(),
   in my tests it effectively always did so.



-- 
Brian May br...@microcomaustralia.com.au
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch