Re: [patch v5 4/6] lib: regexp matching in 'subject' and 'from'

2017-02-26 Thread Jani Nikula
On Thu, 16 Feb 2017, David Bremner  wrote:
> the idea is that you can run
>
> % notmuch search subject://
> % notmuch search from://
>
> or
>
> % notmuch search subject:"your usual phrase search"
> % notmuch search from:"usual phrase search"
>
> This feature is only available with recent Xapian, specifically
> support for field processors is needed.
>
> It should work with bindings, since it extends the query parser.
>
> This is easy to extend for other value slots, but currently the only
> value slots are date, message_id, from, subject, and last_mod. Date is
> already searchable;  message_id is left for a followup commit.
>
> This was originally written by Austin Clements, and ported to Xapian
> field processors (from Austin's custom query parser) by yours truly.
> ---
>  doc/man7/notmuch-search-terms.rst |  25 ++-
>  lib/Makefile.local|   1 +
>  lib/database.cc   |  11 +--
>  lib/regexp-fields.cc  | 144 
> ++
>  lib/regexp-fields.h   |  77 
>  test/T630-regexp-query.sh |  81 +
>  6 files changed, 332 insertions(+), 7 deletions(-)
>  create mode 100644 lib/regexp-fields.cc
>  create mode 100644 lib/regexp-fields.h
>  create mode 100755 test/T630-regexp-query.sh
>
> diff --git a/doc/man7/notmuch-search-terms.rst 
> b/doc/man7/notmuch-search-terms.rst
> index de93d733..47cab48d 100644
> --- a/doc/man7/notmuch-search-terms.rst
> +++ b/doc/man7/notmuch-search-terms.rst
> @@ -34,10 +34,14 @@ indicate user-supplied values):
>  
>  -  from:
>  
> +-  from://
> +
>  -  to:
>  
>  -  subject:
>  
> +-  subject://
> +
>  -  attachment:
>  
>  -  mimetype:
> @@ -71,6 +75,15 @@ subject of an email. Searching for a phrase in the subject 
> is supported
>  by including quotation marks around the phrase, immediately following
>  **subject:**.
>  
> +If notmuch is built with **Xapian Field Processors** (see below) the
> +**from:** and **subject** prefix can be also used to restrict the
> +results to those whose from/subject value matches a regular expression
> +(see **regex(7)**) delimited with //.
> +
> +::
> +
> +   notmuch search 'from:/bob@.*[.]example[.]com/'
> +
>  The **attachment:** prefix can be used to search for specific filenames
>  (or extensions) of attachments to email messages.
>  
> @@ -220,13 +233,18 @@ Boolean and Probabilistic Prefixes
>  --
>  
>  Xapian (and hence notmuch) prefixes are either **boolean**, supporting
> -exact matches like "tag:inbox"  or **probabilistic**, supporting a more 
> flexible **term** based searching. The prefixes currently supported by 
> notmuch are as follows.
> -
> +exact matches like "tag:inbox" or **probabilistic**, supporting a more
> +flexible **term** based searching. Certain **special** prefixes are
> +processed by notmuch in a way not stricly fitting either of Xapian's
> +built in styles. The prefixes currently supported by notmuch are as
> +follows.
>  
>  Boolean
> **tag:**, **id:**, **thread:**, **folder:**, **path:**, **property:**
>  Probabilistic
> -   **from:**, **to:**, **subject:**, **attachment:**, **mimetype:**
> +  **to:**, **attachment:**, **mimetype:**
> +Special
> +   **from:**, **query:**, **subject:**
>  
>  Terms and phrases
>  -
> @@ -396,6 +414,7 @@ Currently the following features require field processor 
> support:
>  
>  - non-range date queries, e.g. "date:today"
>  - named queries e.g. "query:my_special_query"
> +- regular expression searches, e.g. "subject:/^\\[SPAM\\]/"
>  
>  SEE ALSO
>  
> diff --git a/lib/Makefile.local b/lib/Makefile.local
> index b77e5780..cd92fc79 100644
> --- a/lib/Makefile.local
> +++ b/lib/Makefile.local
> @@ -52,6 +52,7 @@ libnotmuch_cxx_srcs =   \
>   $(dir)/query.cc \
>   $(dir)/query-fp.cc  \
>   $(dir)/config.cc\
> + $(dir)/regexp-fields.cc \
>   $(dir)/thread.cc
>  
>  libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) 
> $(libnotmuch_cxx_srcs:.cc=.o)
> diff --git a/lib/database.cc b/lib/database.cc
> index 450ee295..ee971f32 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -21,6 +21,7 @@
>  #include "database-private.h"
>  #include "parse-time-vrp.h"
>  #include "query-fp.h"
> +#include "regexp-fields.h"
>  #include "string-util.h"
>  
>  #include 
> @@ -277,7 +278,8 @@ prefix_t prefix_table[] = {
>   NOTMUCH_FIELD_PROCESSOR },
>  #endif
>  { "from","XFROM",NOTMUCH_FIELD_EXTERNAL |
> - NOTMUCH_FIELD_PROBABILISTIC },
> + NOTMUCH_FIELD_PROBABILISTIC |
> + NOTMUCH_FIELD_PROCESSOR },
>  { "to",  "XTO",  NOTMUCH_FIELD_EXTERNAL |
>   

[patch v5 4/6] lib: regexp matching in 'subject' and 'from'

2017-02-16 Thread David Bremner
the idea is that you can run

% notmuch search subject://
% notmuch search from://

or

% notmuch search subject:"your usual phrase search"
% notmuch search from:"usual phrase search"

This feature is only available with recent Xapian, specifically
support for field processors is needed.

It should work with bindings, since it extends the query parser.

This is easy to extend for other value slots, but currently the only
value slots are date, message_id, from, subject, and last_mod. Date is
already searchable;  message_id is left for a followup commit.

This was originally written by Austin Clements, and ported to Xapian
field processors (from Austin's custom query parser) by yours truly.
---
 doc/man7/notmuch-search-terms.rst |  25 ++-
 lib/Makefile.local|   1 +
 lib/database.cc   |  11 +--
 lib/regexp-fields.cc  | 144 ++
 lib/regexp-fields.h   |  77 
 test/T630-regexp-query.sh |  81 +
 6 files changed, 332 insertions(+), 7 deletions(-)
 create mode 100644 lib/regexp-fields.cc
 create mode 100644 lib/regexp-fields.h
 create mode 100755 test/T630-regexp-query.sh

diff --git a/doc/man7/notmuch-search-terms.rst 
b/doc/man7/notmuch-search-terms.rst
index de93d733..47cab48d 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -34,10 +34,14 @@ indicate user-supplied values):
 
 -  from:
 
+-  from://
+
 -  to:
 
 -  subject:
 
+-  subject://
+
 -  attachment:
 
 -  mimetype:
@@ -71,6 +75,15 @@ subject of an email. Searching for a phrase in the subject 
is supported
 by including quotation marks around the phrase, immediately following
 **subject:**.
 
+If notmuch is built with **Xapian Field Processors** (see below) the
+**from:** and **subject** prefix can be also used to restrict the
+results to those whose from/subject value matches a regular expression
+(see **regex(7)**) delimited with //.
+
+::
+
+   notmuch search 'from:/bob@.*[.]example[.]com/'
+
 The **attachment:** prefix can be used to search for specific filenames
 (or extensions) of attachments to email messages.
 
@@ -220,13 +233,18 @@ Boolean and Probabilistic Prefixes
 --
 
 Xapian (and hence notmuch) prefixes are either **boolean**, supporting
-exact matches like "tag:inbox"  or **probabilistic**, supporting a more 
flexible **term** based searching. The prefixes currently supported by notmuch 
are as follows.
-
+exact matches like "tag:inbox" or **probabilistic**, supporting a more
+flexible **term** based searching. Certain **special** prefixes are
+processed by notmuch in a way not stricly fitting either of Xapian's
+built in styles. The prefixes currently supported by notmuch are as
+follows.
 
 Boolean
**tag:**, **id:**, **thread:**, **folder:**, **path:**, **property:**
 Probabilistic
-   **from:**, **to:**, **subject:**, **attachment:**, **mimetype:**
+  **to:**, **attachment:**, **mimetype:**
+Special
+   **from:**, **query:**, **subject:**
 
 Terms and phrases
 -
@@ -396,6 +414,7 @@ Currently the following features require field processor 
support:
 
 - non-range date queries, e.g. "date:today"
 - named queries e.g. "query:my_special_query"
+- regular expression searches, e.g. "subject:/^\\[SPAM\\]/"
 
 SEE ALSO
 
diff --git a/lib/Makefile.local b/lib/Makefile.local
index b77e5780..cd92fc79 100644
--- a/lib/Makefile.local
+++ b/lib/Makefile.local
@@ -52,6 +52,7 @@ libnotmuch_cxx_srcs = \
$(dir)/query.cc \
$(dir)/query-fp.cc  \
$(dir)/config.cc\
+   $(dir)/regexp-fields.cc \
$(dir)/thread.cc
 
 libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o)
diff --git a/lib/database.cc b/lib/database.cc
index 450ee295..ee971f32 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -21,6 +21,7 @@
 #include "database-private.h"
 #include "parse-time-vrp.h"
 #include "query-fp.h"
+#include "regexp-fields.h"
 #include "string-util.h"
 
 #include 
@@ -277,7 +278,8 @@ prefix_t prefix_table[] = {
NOTMUCH_FIELD_PROCESSOR },
 #endif
 { "from",  "XFROM",NOTMUCH_FIELD_EXTERNAL |
-   NOTMUCH_FIELD_PROBABILISTIC },
+   NOTMUCH_FIELD_PROBABILISTIC |
+   NOTMUCH_FIELD_PROCESSOR },
 { "to","XTO",  NOTMUCH_FIELD_EXTERNAL |
NOTMUCH_FIELD_PROBABILISTIC },
 { "attachment","XATTACHMENT",  NOTMUCH_FIELD_EXTERNAL |
@@ -285,7 +287,8 @@ prefix_t prefix_table[] = {
 { "mimetype",  "XMIMETYPE",NOTMUCH_FIELD_EXTERNAL |
NOTMUCH_FIELD_PROBABILISTIC },
 { "subject",