[PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'

2011-07-13 Thread Pieter Praet
On Mon, 11 Jul 2011 17:05:32 -0400, Austin Clements  wrote:
> Quoth Pieter Praet on Jul 11 at 10:43 pm:
> > TL;DR: I can haz regex pl0x?
> 
> Oof, what a pain.  I'm happy to change the output format of search; I
> hadn't realized how difficult it would be to parse.  In fact, I'm not
> sure it's even parsable by regexp, because the message ID's themselves
> could contain parens.
> 
> So what would be a good format?  One possibility would be to
> NULL-delimit the query part; as distasteful as I find that, this part
> of the search output isn't meant for user consumption.  Though I fear
> this is endemic to the dual role the search output currently plays as
> both user and computer readable.
> 
> I've also got the code to do everything using document ID's instead of
> message ID's.  As a side-effect, it makes the search output clean and
> readily parsable since document ID's are just numbers.  Hence, there
> are no quoting or escaping issues (plus the output is much more
> compact).  I haven't sent this to the list yet because I haven't had a
> chance to benchmark it and determine if the performance benefits make
> exposing document ID's worthwhile.

Jamie Zawinski once said/wrote [1]:
  'Some people, when confronted with a problem, think "I know,
  I'll use regular expressions." Now they have two problems.'

With this in mind, I set out to get rid of this whole regex mess altogether,
by populating the search buffer using Notmuch's JSON output instead of doing
brittle text matching tricks.

Looking for some documentation, I stumbled upon a long-forgotten gem [2].

David's already done pretty much all of the work for us!

Unfortunately, it doesn't apply cleanly to master anymore.

David, would you mind rebasing it?


Peace

-- 
Pieter

[1] http://www.jwz.org/hacks/
[2] id:"1290777202-14040-1-git-send-email-dme at dme.org"


[PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'

2011-07-13 Thread David Edmondson
* pieter at praet.org [2011-07-13 Wed 15:16]
> David, would you mind rebasing it?

I'm sorry, I'm not likely to have time to do this.


[PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'

2011-07-13 Thread Austin Clements
Quoth Pieter Praet on Jul 13 at  4:16 pm:
> On Mon, 11 Jul 2011 17:05:32 -0400, Austin Clements  
> wrote:
> > Quoth Pieter Praet on Jul 11 at 10:43 pm:
> > > TL;DR: I can haz regex pl0x?
> > 
> > Oof, what a pain.  I'm happy to change the output format of search; I
> > hadn't realized how difficult it would be to parse.  In fact, I'm not
> > sure it's even parsable by regexp, because the message ID's themselves
> > could contain parens.
> > 
> > So what would be a good format?  One possibility would be to
> > NULL-delimit the query part; as distasteful as I find that, this part
> > of the search output isn't meant for user consumption.  Though I fear
> > this is endemic to the dual role the search output currently plays as
> > both user and computer readable.
> > 
> > I've also got the code to do everything using document ID's instead of
> > message ID's.  As a side-effect, it makes the search output clean and
> > readily parsable since document ID's are just numbers.  Hence, there
> > are no quoting or escaping issues (plus the output is much more
> > compact).  I haven't sent this to the list yet because I haven't had a
> > chance to benchmark it and determine if the performance benefits make
> > exposing document ID's worthwhile.
> 
> Jamie Zawinski once said/wrote [1]:
>   'Some people, when confronted with a problem, think "I know,
>   I'll use regular expressions." Now they have two problems.'
> 
> With this in mind, I set out to get rid of this whole regex mess altogether,
> by populating the search buffer using Notmuch's JSON output instead of doing
> brittle text matching tricks.
> 
> Looking for some documentation, I stumbled upon a long-forgotten gem [2].
> 
> David's already done pretty much all of the work for us!

Yes, similar thoughts were running through my head as I futzed with
the formatting for this.  My concern with moving to JSON for search
buffers is that parsing it is about *30 times slower* than the current
regexp-based approach (0.6 seconds versus 0.02 seconds for a mere 1413
result search buffer).  I think JSON makes a lot of sense for show
buffers because there's generally less data and it has a lot of
complicated structure.  Search results, on the other hand, have a very
simple, regular, and constrained structure, so JSON doesn't buy us
nearly as much.

JSON is hard to parse because, like the text search output, it's
designed for human consumption (of course, unlike the text search
output, it's also designed for computer consumption).  There's
something to be said for the debuggability and generality of this and
JSON is very good for exchanging small objects, but it's a remarkably
inefficient way to exchange large amounts of data between two
programs.

I guess what I'm getting at, though it pains me to say it, is perhaps
search needs a fast, computer-readable interchange format.  The
structure of the data is so simple and constrained that this could be
altogether trivial.

Or maybe I need a faster computer.


If anyone is curious, here's how I timed the parsing.

(defmacro time-it (code)
  `(let ((start-time (get-internal-run-time)))
 ,code
 (float-time (time-subtract (get-internal-run-time) start-time

(with-current-buffer "json"
  (goto-char (point-min))
  (time-it (json-read)))

(with-current-buffer "text"
  (goto-char (point-min))
  (time-it
   (while (re-search-forward "^\\(thread:[0-9A-Fa-f]*\\) \\([^][]*\\) 
\\(\\[[0-9/]*\\]\\) \\([^;]*\\); \\(.*\\) (\\([^()]*\\))$" nil t


Encodings

2011-07-13 Thread Patrick Totzke
Hi Uwe,

On Wed, Jul 13, 2011 at 09:04:47AM +0200, Uwe Kleine-K?nig wrote:
> > But as Carl sais, we cannot guarantee that a tag is utf8 encoded anyway.
> I think it would be right to enforce that tags are utf-8 encoded.
> Otherwise the users get strange results if they change their locale.

I agree that it would be very nice indeed if it was safe to assume
all tags are utf-8. But i also see that it's a bit of an effort
to ensure this as all UI's would have to explicitly recode
stuff that isn't utf-8.
It seems to be a conciously made design decision to allow
other encodings for tags, which is up for discussion f course.
All I'm saying is that the bindings should conform. And if it's 
not safe to assume utf-8 here, we shouldn't decode as such.

I'm unsure what happens in all the new get_part() parts of the api.
If there, all mimepart-text is also returned as utf-8, it would only
be consistant to bend tag encodings to utf-8 also. But I doubt thats the case.
Can anyone clarify this?
/Patrick
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110713/2c748d5a/attachment.pgp>


Encodings

2011-07-13 Thread Uwe Kleine-König
Hi Patrick,

On Tue, Jul 12, 2011 at 10:29:58PM +0100, Patrick Totzke wrote:
> I noticed that commit 687366b920caa5de6ea0b66b70cf2a11e5399f7b
> breaks things with Database.get_all_tags:
> 
> -->%-
> AttributeErrorTraceback (most recent call last)
> 
> /home/pazz/projects/alot/ in ()
> 
> /usr/local/lib/python2.7/dist-packages/notmuch/tag.pyc in next(self)
>  86 # No need to call nmlib.notmuch_tags_valid(self._tags);
> 
>  87 # Tags._get safely returns None, if there is no more valid 
> tag.
> 
> ---> 88 tag = Tags._get(self._tags).decode('utf-8')
>  89 if tag is None:
>  90 self._tags = None
> 
> AttributeError: 'NoneType' object has no attribute 'decode'
> %<---
> 
> The reason is that the Tags.next() tries to decode before it tests if tag is 
> None.
> Now, we _could_ apply a patch like this one here:
> 
> -->%-
> diff --git a/bindings/python/notmuch/tag.py b/bindings/python/notmuch/tag.py
> index 65a9118..2ae670d 100644
> --- a/bindings/python/notmuch/tag.py
> +++ b/bindings/python/notmuch/tag.py
> @@ -85,12 +85,12 @@ class Tags(object):
>  raise NotmuchError(STATUS.NOT_INITIALIZED)
>  # No need to call nmlib.notmuch_tags_valid(self._tags);
>  # Tags._get safely returns None, if there is no more valid tag.
> -tag = Tags._get(self._tags).decode('utf-8')
> +tag = Tags._get(self._tags)
>  if tag is None:
>  self._tags = None
>  raise StopIteration
>  nmlib.notmuch_tags_move_to_next(self._tags)
> -return tag
> +return tag.decode('utf-8')
>  
>  def __nonzero__(self):
>  """Implement bool(Tags) check that can be repeatedly used
> ---%<-
> 
> But as Carl sais, we cannot guarantee that a tag is utf8 encoded anyway.
I think it would be right to enforce that tags are utf-8 encoded.
Otherwise the users get strange results if they change their locale.

Best regards
Uwe


Slowness (search opens every email file?)

2011-07-13 Thread Austin Clements
Quoth Istvan Marko on Jul 12 at  8:07 pm:
> Austin Clements  writes:
> 
> > I'd say this patch looks good other than coding style
> > - Tab indentation
> > - /* */ comments, starting with a capital letter
> > - Space between function name and open paren
> > - Space after comma in argument lists
> > - Spaces around assignment operator
> 
> Thanks, fixed the ones I see:

+/* Fetch header from the appropriate xapian value field if
+ * available */
+if (strcmp(header, "from") == 0)
+   value = message->doc.get_value(NOTMUCH_VALUE_FROM);
+else if (strcmp(header, "subject") == 0)
+   value = message->doc.get_value (NOTMUCH_VALUE_SUBJECT);
+else if (strcmp(header, "message-id") == 0)
+   value = message->doc.get_value (NOTMUCH_VALUE_MESSAGE_ID);

The strcmp's should have a space before the paren, as should the first
get_value.  (Yeah, it's weird.  Blame glib.)

Also, it occurred to me that these should be strcasecmp's, since
headers are case-insensitive.


Notmuch mail notifier applet for Gnome?

2011-07-13 Thread Albin Stjerna
Hi all

I've been using a simple notmuch count-based script together with xmobar in 
xmonad to notify me of new mail. However, I'm thinking of switching to a 
Gnome/xmonad-based combo, thus giving up xmobar. I've been looking for a 
replacement to my mail checker that would work with gnome-panel, but so far I 
haven't found any. I'm sure there has to be at least one Gnome user on the 
notmuch mailing list, so I thought I'd ask for help.

How are you solving this? Basically anything that would display a notification 
icon if a given shell command returned true would suffice, though I'd prefer 
something that would also display a given text (read: a short fom of the 
specified query and the mail count for it, something like: ?personal: 10? ? 
possibly during mouse over).

Thanks in advance,
  Albin


Notmuch mail notifier applet for Gnome?

2011-07-13 Thread Patrick Totzke
Hi Albin,

This is not an answer to your inquiry, but because I looked into something 
similar recently for my setup I'd thought I'd
share: I used xmonad before but switched to the awesome [0] tiling WM. I hacked 
my solution into their wiki, including a
screensot [1].
best,
/p



[0] http://awesome.naquadah.org/
[1] https://awesome.naquadah.org/wiki/Notmuch_mail_integration
On Wed, Jul 13, 2011 at 12:02:06AM +0200, Albin Stjerna wrote:
> Hi all
> 
> I've been using a simple notmuch count-based script together with xmobar in 
> xmonad to notify me of new mail. However, I'm thinking of switching to a 
> Gnome/xmonad-based combo, thus giving up xmobar. I've been looking for a 
> replacement to my mail checker that would work with gnome-panel, but so far I 
> haven't found any. I'm sure there has to be at least one Gnome user on the 
> notmuch mailing list, so I thought I'd ask for help.
> 
> How are you solving this? Basically anything that would display a 
> notification icon if a given shell command returned true would suffice, 
> though I'd prefer something that would also display a given text (read: a 
> short fom of the specified query and the mail count for it, something like: 
> ?personal: 10? ? possibly during mouse over).
> 
> Thanks in advance,
>   Albin
> ___
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: 



Re: Encodings

2011-07-13 Thread Uwe Kleine-König
Hi Patrick,

On Tue, Jul 12, 2011 at 10:29:58PM +0100, Patrick Totzke wrote:
 I noticed that commit 687366b920caa5de6ea0b66b70cf2a11e5399f7b
 breaks things with Database.get_all_tags:
 
 --%-
 AttributeErrorTraceback (most recent call last)
 
 /home/pazz/projects/alot/ipython console in module()
 
 /usr/local/lib/python2.7/dist-packages/notmuch/tag.pyc in next(self)
  86 # No need to call nmlib.notmuch_tags_valid(self._tags);
 
  87 # Tags._get safely returns None, if there is no more valid 
 tag.
 
 --- 88 tag = Tags._get(self._tags).decode('utf-8')
  89 if tag is None:
  90 self._tags = None
 
 AttributeError: 'NoneType' object has no attribute 'decode'
 %---
 
 The reason is that the Tags.next() tries to decode before it tests if tag is 
 None.
 Now, we _could_ apply a patch like this one here:
 
 --%-
 diff --git a/bindings/python/notmuch/tag.py b/bindings/python/notmuch/tag.py
 index 65a9118..2ae670d 100644
 --- a/bindings/python/notmuch/tag.py
 +++ b/bindings/python/notmuch/tag.py
 @@ -85,12 +85,12 @@ class Tags(object):
  raise NotmuchError(STATUS.NOT_INITIALIZED)
  # No need to call nmlib.notmuch_tags_valid(self._tags);
  # Tags._get safely returns None, if there is no more valid tag.
 -tag = Tags._get(self._tags).decode('utf-8')
 +tag = Tags._get(self._tags)
  if tag is None:
  self._tags = None
  raise StopIteration
  nmlib.notmuch_tags_move_to_next(self._tags)
 -return tag
 +return tag.decode('utf-8')
  
  def __nonzero__(self):
  Implement bool(Tags) check that can be repeatedly used
 ---%-
 
 But as Carl sais, we cannot guarantee that a tag is utf8 encoded anyway.
I think it would be right to enforce that tags are utf-8 encoded.
Otherwise the users get strange results if they change their locale.

Best regards
Uwe
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Encodings

2011-07-13 Thread Patrick Totzke
Hi Uwe,

On Wed, Jul 13, 2011 at 09:04:47AM +0200, Uwe Kleine-König wrote:
  But as Carl sais, we cannot guarantee that a tag is utf8 encoded anyway.
 I think it would be right to enforce that tags are utf-8 encoded.
 Otherwise the users get strange results if they change their locale.

I agree that it would be very nice indeed if it was safe to assume
all tags are utf-8. But i also see that it's a bit of an effort
to ensure this as all UI's would have to explicitly recode
stuff that isn't utf-8.
It seems to be a conciously made design decision to allow
other encodings for tags, which is up for discussion f course.
All I'm saying is that the bindings should conform. And if it's 
not safe to assume utf-8 here, we shouldn't decode as such.

I'm unsure what happens in all the new get_part() parts of the api.
If there, all mimepart-text is also returned as utf-8, it would only
be consistant to bend tag encodings to utf-8 also. But I doubt thats the case.
Can anyone clarify this?
/Patrick


signature.asc
Description: Digital signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'

2011-07-13 Thread Pieter Praet
On Mon, 11 Jul 2011 17:05:32 -0400, Austin Clements amdra...@mit.edu wrote:
 Quoth Pieter Praet on Jul 11 at 10:43 pm:
  TL;DR: I can haz regex pl0x?
 
 Oof, what a pain.  I'm happy to change the output format of search; I
 hadn't realized how difficult it would be to parse.  In fact, I'm not
 sure it's even parsable by regexp, because the message ID's themselves
 could contain parens.
 
 So what would be a good format?  One possibility would be to
 NULL-delimit the query part; as distasteful as I find that, this part
 of the search output isn't meant for user consumption.  Though I fear
 this is endemic to the dual role the search output currently plays as
 both user and computer readable.
 
 I've also got the code to do everything using document ID's instead of
 message ID's.  As a side-effect, it makes the search output clean and
 readily parsable since document ID's are just numbers.  Hence, there
 are no quoting or escaping issues (plus the output is much more
 compact).  I haven't sent this to the list yet because I haven't had a
 chance to benchmark it and determine if the performance benefits make
 exposing document ID's worthwhile.

Jamie Zawinski once said/wrote [1]:
  'Some people, when confronted with a problem, think I know,
  I'll use regular expressions. Now they have two problems.'

With this in mind, I set out to get rid of this whole regex mess altogether,
by populating the search buffer using Notmuch's JSON output instead of doing
brittle text matching tricks.

Looking for some documentation, I stumbled upon a long-forgotten gem [2].

David's already done pretty much all of the work for us!

Unfortunately, it doesn't apply cleanly to master anymore.

David, would you mind rebasing it?


Peace

-- 
Pieter

[1] http://www.jwz.org/hacks/
[2] id:1290777202-14040-1-git-send-email-...@dme.org
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'

2011-07-13 Thread David Edmondson
* pie...@praet.org [2011-07-13 Wed 15:16]
 David, would you mind rebasing it?

I'm sorry, I'm not likely to have time to do this.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'

2011-07-13 Thread Austin Clements
Quoth Pieter Praet on Jul 13 at  4:16 pm:
 On Mon, 11 Jul 2011 17:05:32 -0400, Austin Clements amdra...@mit.edu wrote:
  Quoth Pieter Praet on Jul 11 at 10:43 pm:
   TL;DR: I can haz regex pl0x?
  
  Oof, what a pain.  I'm happy to change the output format of search; I
  hadn't realized how difficult it would be to parse.  In fact, I'm not
  sure it's even parsable by regexp, because the message ID's themselves
  could contain parens.
  
  So what would be a good format?  One possibility would be to
  NULL-delimit the query part; as distasteful as I find that, this part
  of the search output isn't meant for user consumption.  Though I fear
  this is endemic to the dual role the search output currently plays as
  both user and computer readable.
  
  I've also got the code to do everything using document ID's instead of
  message ID's.  As a side-effect, it makes the search output clean and
  readily parsable since document ID's are just numbers.  Hence, there
  are no quoting or escaping issues (plus the output is much more
  compact).  I haven't sent this to the list yet because I haven't had a
  chance to benchmark it and determine if the performance benefits make
  exposing document ID's worthwhile.
 
 Jamie Zawinski once said/wrote [1]:
   'Some people, when confronted with a problem, think I know,
   I'll use regular expressions. Now they have two problems.'
 
 With this in mind, I set out to get rid of this whole regex mess altogether,
 by populating the search buffer using Notmuch's JSON output instead of doing
 brittle text matching tricks.
 
 Looking for some documentation, I stumbled upon a long-forgotten gem [2].
 
 David's already done pretty much all of the work for us!

Yes, similar thoughts were running through my head as I futzed with
the formatting for this.  My concern with moving to JSON for search
buffers is that parsing it is about *30 times slower* than the current
regexp-based approach (0.6 seconds versus 0.02 seconds for a mere 1413
result search buffer).  I think JSON makes a lot of sense for show
buffers because there's generally less data and it has a lot of
complicated structure.  Search results, on the other hand, have a very
simple, regular, and constrained structure, so JSON doesn't buy us
nearly as much.

JSON is hard to parse because, like the text search output, it's
designed for human consumption (of course, unlike the text search
output, it's also designed for computer consumption).  There's
something to be said for the debuggability and generality of this and
JSON is very good for exchanging small objects, but it's a remarkably
inefficient way to exchange large amounts of data between two
programs.

I guess what I'm getting at, though it pains me to say it, is perhaps
search needs a fast, computer-readable interchange format.  The
structure of the data is so simple and constrained that this could be
altogether trivial.

Or maybe I need a faster computer.


If anyone is curious, here's how I timed the parsing.

(defmacro time-it (code)
  `(let ((start-time (get-internal-run-time)))
 ,code
 (float-time (time-subtract (get-internal-run-time) start-time

(with-current-buffer json
  (goto-char (point-min))
  (time-it (json-read)))

(with-current-buffer text
  (goto-char (point-min))
  (time-it
   (while (re-search-forward ^\\(thread:[0-9A-Fa-f]*\\) \\([^][]*\\) 
\\(\\[[0-9/]*\\]\\) \\([^;]*\\); \\(.*\\) (\\([^()]*\\))$ nil t
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] test: Adding non-maildir tags does not move message from new to cur

2011-07-13 Thread Michal Sojka
From: Michal Sojka so...@os.inf.tu-dresden.de

This adds a test for patch submitted by Louis Rilling. Without his patch
applied this test fails.
---
 test/maildir-sync |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/test/maildir-sync b/test/maildir-sync
index a60854f..e1ad81c 100755
--- a/test/maildir-sync
+++ b/test/maildir-sync
@@ -88,6 +88,12 @@ test_expect_equal $output No new mail.
 # creating new directories in the mail store, then it should be
 # creating all necessary database state for those directories.
 
+test_begin_subtest Adding non-maildir tags does not move message from new to 
cur
+add_message [subject]='Message to stay in new' [date]='Sat, 01 Jan 2000 
12:00:00 -' [filename]='message-to-stay-in-new' [dir]=new
+notmuch tag +donotmove subject:Message to stay in new
+output=$(cd $MAIL_DIR; ls */message-to-stay-in-new*)
+test_expect_equal $output new/message-to-stay-in-new
+
 test_begin_subtest Removing 'S' flag from existing filename adds 'unread' tag
 add_message [subject]='Removing S flag' [filename]='removing-s-flag:2,S' 
[dir]=cur
 output=$(notmuch search subject:Removing S flag | notmuch_search_sanitize)
-- 
1.7.5.4

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch