[PATCH] lib: add 'body:' field, stop indexing headers twice.
The new `body:` field (in Xapian terms) or prefix (in slightly sloppier notmuch) terms allows matching terms that occur only in the body. Unprefixed query terms should continue to match anywhere (header or body) in the message. This follows a suggestion of Olly Betts to use the facility (since Xapian 1.0.4) to add the same field with multiple prefixes. The double indexing of previous versions is thus replaced with a query time expension of unprefixed query terms to the various prefixed equivalent. Reindexing will be needed for negated 'body:' searches to work correctly. --- doc/man7/notmuch-search-terms.rst | 5 +++- lib/database.cc | 6 + lib/message.cc| 10 +++ test/T730-body.sh | 43 +++ 4 files changed, 58 insertions(+), 6 deletions(-) create mode 100755 test/T730-body.sh diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst index f7a39ceb..fd8bf634 100644 --- a/doc/man7/notmuch-search-terms.rst +++ b/doc/man7/notmuch-search-terms.rst @@ -44,6 +44,9 @@ results to those whose value matches a regular expression (see notmuch search 'from:"/bob@.*[.]example[.]com/"' +body: +Match terms in the body of messages. + from: or from:// The **from:** prefix is used to match the name or address of the sender of an email message. @@ -249,7 +252,7 @@ follows. Boolean **tag:**, **id:**, **thread:**, **folder:**, **path:**, **property:** Probabilistic - **to:**, **attachment:**, **mimetype:** + **body:**, **to:**, **attachment:**, **mimetype:** Special **from:**, **query:**, **subject:** diff --git a/lib/database.cc b/lib/database.cc index 9cf8062c..27c2d042 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -259,6 +259,8 @@ prefix_t prefix_table[] = { { "directory", "XDIRECTORY", NOTMUCH_FIELD_NO_FLAGS }, { "file-direntry", "XFDIRENTRY", NOTMUCH_FIELD_NO_FLAGS }, { "directory-direntry","XDDIRENTRY", NOTMUCH_FIELD_NO_FLAGS }, +{ "body", "", NOTMUCH_FIELD_EXTERNAL | + NOTMUCH_FIELD_PROBABILISTIC}, { "thread","G",NOTMUCH_FIELD_EXTERNAL | NOTMUCH_FIELD_PROCESSOR }, { "tag", "K",NOTMUCH_FIELD_EXTERNAL | @@ -302,6 +304,8 @@ prefix_t prefix_table[] = { static void _setup_query_field_default (const prefix_t *prefix, notmuch_database_t *notmuch) { +if (prefix->prefix) + notmuch->query_parser->add_prefix("",prefix->prefix); if (prefix->flags & NOTMUCH_FIELD_PROBABILISTIC) notmuch->query_parser->add_prefix (prefix->name, prefix->prefix); else @@ -326,6 +330,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch) *notmuch->query_parser, notmuch))->release (); /* we treat all field-processor fields as boolean in order to get the raw input */ + if (prefix->prefix) + notmuch->query_parser->add_prefix("",prefix->prefix); notmuch->query_parser->add_boolean_prefix (prefix->name, fp); } else { _setup_query_field_default (prefix, notmuch); diff --git a/lib/message.cc b/lib/message.cc index 6f2f6345..64349f83 100644 --- a/lib/message.cc +++ b/lib/message.cc @@ -1443,13 +1443,13 @@ _notmuch_message_gen_terms (notmuch_message_t *message, message->termpos = term_gen->get_termpos () + 100; _notmuch_message_invalidate_metadata (message, prefix_name); +} else { + term_gen->set_termpos (message->termpos); + term_gen->index_text (text); + /* Create a term gap, as above. */ + message->termpos = term_gen->get_termpos () + 100; } -term_gen->set_termpos (message->termpos); -term_gen->index_text (text); -/* Create a term gap, as above. */ -message->termpos = term_gen->get_termpos () + 100; - return NOTMUCH_PRIVATE_STATUS_SUCCESS; } diff --git a/test/T730-body.sh b/test/T730-body.sh new file mode 100755 index ..548b30a4 --- /dev/null +++ b/test/T730-body.sh @@ -0,0 +1,43 @@ +#!/usr/bin/env bash +test_description='search body' +. $(dirname "$0")/test-lib.sh || exit 1 + +add_message "[body]=thebody-1" "[subject]=subject-1" +add_message "[body]=nothing-to-see-here-1" "[subject]=thebody-1" + +test_begin_subtest 'search with body: prefix' +notmuch search body:thebody | notmuch_search_sanitize > OUTPUT +cat < EXPECTED +thread:XXX 2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread) +EOF +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest 'search without body: prefix' +notmuch search thebody | notmuch_search_sanitize > OUTPUT +cat < EXPECTED +thread:XXX 2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread) +thread:XXX 2001-01-05 [1/1] Notmuch Test Suite; thebody-1 (inbox unread) +EOF +test
Re: WIP2: index user headers
David Bremner writes: > I had another thought about user prefixes. I wonder if they should all > be forcibly prefixed with something that prevents collisions, to prevent > later pain if we add an "official" prefix with the same name. A quick > tests suggest it would work to use something like _ > > so > > notmuch search --output=files _list:notmuch > > works. It's a bit ugly, I'll have to play with other options; the main > question is whether we think prefixing is needed / worth-it. I played with the query parser a bit, and the only idea I found so far is to reserve prefixes starting with lower case ASCII for notmuch, and allow users to use anything else. d ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
[PATCH] doc: sequentialize calls to sphinx-build
In certain conditions the parallel calls to sphinx-build could collide, yielding a crash like Exception occurred: File "/usr/lib/python3/dist-packages/sphinx/environment.py", line 1261, in get_doctree doctree = pickle.load(f) EOFError: Ran out of input --- I first observed this on Debian/ppc64le. There it occured about 1 in 50 builds. Thanks to Olly Betts for pointing me in the right direction. doc/Makefile.local | 7 +++ 1 file changed, 7 insertions(+) diff --git a/doc/Makefile.local b/doc/Makefile.local index 16459e35..eed243a0 100644 --- a/doc/Makefile.local +++ b/doc/Makefile.local @@ -37,6 +37,13 @@ INFO_INFO_FILES := $(INFO_TEXI_FILES:.texi=.info) %.gz: % rm -f $@ && gzip --stdout $^ > $@ +# Sequentialize the calls to sphinx-build to avoid races with +# reading/writing state. + +sphinx-html: | $(DOCBUILDDIR)/.roff.stamp +sphinx-texinfo: | sphinx-html +sphinx-info: | sphinx-texinfo + sphinx-html: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(DOCBUILDDIR)/html -- 2.20.1 ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH] Fix notmuch-describe-key
Yang Sheng writes: > Fix notmuch-describe-key crashing for the following two cases > 1. format-kbd-macro cannot deal with keys like [(32 . 126)], switch to > use key-description instead. > 2. if a function in the current keymap is not bounded, it will crash > the whole process. We check if it is bounded and silently skip it to > avoid crashing. > --- > emacs/notmuch-lib.el | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el > index 8cf7261e..546ab6fd 100644 > --- a/emacs/notmuch-lib.el > +++ b/emacs/notmuch-lib.el > @@ -298,7 +298,7 @@ This is basically just `format-kbd-macro' but we also > convert ESC to M-." >"Prepend cons cells describing prefix-arg ACTUAL-KEY and ACTUAL-KEY to TAIL > > It does not prepend if ACTUAL-KEY is already listed in TAIL." > - (let ((key-string (concat prefix (format-kbd-macro actual-key > + (let ((key-string (concat prefix (key-description actual-key > ;; We don't include documentation if the key-binding is > ;; over-ridden. Note, over-riding a binding automatically hides the > ;; prefixed version too. > @@ -313,7 +313,7 @@ It does not prepend if ACTUAL-KEY is already listed in > TAIL." >;; Documentation for command >(push (cons key-string > (or (and (symbolp binding) (get binding 'notmuch-doc)) > - (notmuch-documentation-first-line binding))) > + (and (functionp binding) > (notmuch-documentation-first-line binding > tail))) > tail) > Thanks! Some context: this fixes an issue in spacemacs where the help key is broken: https://github.com/syl20bnr/spacemacs/issues/10123 -- https://jb55.com ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: WIP2: index user headers
David Bremner writes: > This obsoletes [1] > This is getting closer to mergable, but it still needs at least to > sanity check the names of user defined prefixes (see point (a) below). > > The main differences from [1] are > > (a) xapian prefixes are no longer defined via upper casing, as this is > locale dependent. The do rely on a ":" separator, hence the need > for some sanitization. > > (b) The caching of user header/prefix information is now done via > string maps, and used more effectively during indexing. I had another thought about user prefixes. I wonder if they should all be forcibly prefixed with something that prevents collisions, to prevent later pain if we add an "official" prefix with the same name. A quick tests suggest it would work to use something like _ so notmuch search --output=files _list:notmuch works. It's a bit ugly, I'll have to play with other options; the main question is whether we think prefixing is needed / worth-it. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: Reply to content of "List-Post" header?
* Gregor Zattler: > my procmmail scripts recognize mailing list headers and in the end > there is a list of all mailing lists I ever got mails from [...] Interesting method. I don't use procmail, but I wrote a small shell script that scans my existing Maildir storage for List-Post headers and extracts the addresses. This works, in a fashion, but I wonder if/how it would be possible to make Notmuch understand List-Post natively? Or is this something that needs to be solved in Emacs' message composition? -Ralph ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch