Re: Proof of concept for counting messages in thread
Michael J Gruber writes: > That is really weird: > ``` > xapian-delve -t G00021229 . > Posting List for term 'G00021229' (termfreq 115, collfreq 0, > wdf_max 0): 146259 ... > ``` > with 115 record numbers, all different. > Doing `xapian-delve -1r` for each of them and grepping for the G-lines > gives 115 times that correct thread id. > Grepping for the Q-lines and notmuch-searching for the message ids > gives only 5 results (the expected ones). Apparantly, there are bogus > mail records which that thread points to. 1) Do those "bogus" records have a "Tghost" term? That would be for messages that are known via references, but not actually in the local database. This is a bug / feature of the current implementation, it counts all messages known, whether or not local copies exist. 2) Do they have more than one G term? That suggests a bug somewhere. We actually have a test in the test suite [1] for that, but of course that is with a simple artificial database. [1]: in T670-duplicate-mid.sh: db=$HOME/.local/share/notmuch/default/xapian for doc in $(xapian-delve -1 -t '' "$db" | grep '^[1-9]'); do xapian-delve -1 -r "$doc" "$db" | grep -c '^G' done > OUTPUT.raw sort -u < OUTPUT.raw > OUTPUT ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: Proof of concept for counting messages in thread
Am Mo., 13. Feb. 2023 um 21:23 Uhr schrieb David Bremner : > > Michael J Gruber writes: > > > > It has 5, as confirmed by the search output and that of `notmuch > > count`. But it is matched by `count 115`. > > `xapian-check` is happy. (There used to be some issue with additional > > thread entries at some point.) > > > > Michael > > A simple test to try is > > % xapian-delve -t G00021229 \ > ~/.local/share/notmuch/default/xapian > > adjusting your database path as needed. > > If that says "termfreq 115", then something is broken (or at least > confusing) about your database (possibly related to the previous issues > with threading). In that case I'm curious if there are 115 distinct > record numbers. You can find all of the thread-ids attached to a given > message with > > % xapian-delve -1r 267585 ~/.local/share/notmuch/default/xapian | grep ^G > > where 267585 is an example record number in my database. That is really weird: ``` xapian-delve -t G00021229 . Posting List for term 'G00021229' (termfreq 115, collfreq 0, wdf_max 0): 146259 ... ``` with 115 record numbers, all different. Doing `xapian-delve -1r` for each of them and grepping for the G-lines gives 115 times that correct thread id. Grepping for the Q-lines and notmuch-searching for the message ids gives only 5 results (the expected ones). Apparantly, there are bogus mail records which that thread points to. I guess I should recreate the db, if I only knew how lieer deals with a reindexed mail store ... (The thread and the 5 message sit in an mbsynced folder, but lieer syncs other folders with that same db). Michael ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: Proof of concept for counting messages in thread
Michael J Gruber writes: > > It has 5, as confirmed by the search output and that of `notmuch > count`. But it is matched by `count 115`. > `xapian-check` is happy. (There used to be some issue with additional > thread entries at some point.) > > Michael A simple test to try is % xapian-delve -t G00021229 \ ~/.local/share/notmuch/default/xapian adjusting your database path as needed. If that says "termfreq 115", then something is broken (or at least confusing) about your database (possibly related to the previous issues with threading). In that case I'm curious if there are 115 distinct record numbers. You can find all of the thread-ids attached to a given message with % xapian-delve -1r 267585 ~/.local/share/notmuch/default/xapian | grep ^G where 267585 is an example record number in my database. ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: Proof of concept for counting messages in thread
Am Mo., 13. Feb. 2023 um 17:32 Uhr schrieb David Bremner : > > Michael J Gruber writes: > > > I am getting a few surprising matches, e.g. > > ``` > > notmuch search --query=sexp '(thread (count 115)))' > > thread:00021229 2021-05-17 [5/5] Michael J Gruber ... redacted > > notmuch count --exclude=false thread:00021229 > > 5 > > ``` > > It could be some database issues, of course. Or me misunderstanding > > something :) > > Hmm. I don't see any strange matches for that particular query, just a > thread that actually has 115 messages. But there could also be bugs of > course. Does xapin-check complain about your database? It has 5, as confirmed by the search output and that of `notmuch count`. But it is matched by `count 115`. `xapian-check` is happy. (There used to be some issue with additional thread entries at some point.) Michael ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: Proof of concept for counting messages in thread
Michael J Gruber writes: > I am getting a few surprising matches, e.g. > ``` > notmuch search --query=sexp '(thread (count 115)))' > thread:00021229 2021-05-17 [5/5] Michael J Gruber ... redacted > notmuch count --exclude=false thread:00021229 > 5 > ``` > It could be some database issues, of course. Or me misunderstanding something > :) Hmm. I don't see any strange matches for that particular query, just a thread that actually has 115 messages. But there could also be bugs of course. Does xapin-check complain about your database? > > Patch 1/2 is crlf garbled, by the way. Applies cleanly after removing > the extra ^Ms. Hmm. Probably because of Content-Transfer-Encoding: 8bit I have a direct mailed copy that didn't go through mailman, and that looks OK. > > Michael ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: Proof of concept for counting messages in thread
Am Mo., 13. Feb. 2023 um 13:26 Uhr schrieb David Bremner : > > So for this only supports counting messages in threads, and the sexp > based query parser. It seems useful to expand it to other fields > (from, e.g.). I'm not sure how motivated I am to shim this into the > infix query parser, but we will see how it goes. This certainly looks interesting, and not easy to get by scripting around the existing commands. It is kinda special, so having it in sexp only seems okay. I am getting a few surprising matches, e.g. ``` notmuch search --query=sexp '(thread (count 115)))' thread:00021229 2021-05-17 [5/5] Michael J Gruber ... redacted notmuch count --exclude=false thread:00021229 5 ``` It could be some database issues, of course. Or me misunderstanding something :) Patch 1/2 is crlf garbled, by the way. Applies cleanly after removing the extra ^Ms. Michael ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
[PATCH 1/2] WIP/lib: add count query backend
--- lib/Makefile.local | 3 +- lib/count-query.cc | 62 ++ lib/database-private.h | 6 3 files changed, 70 insertions(+), 1 deletion(-) create mode 100644 lib/count-query.cc diff --git a/lib/Makefile.local b/lib/Makefile.local index 4e766305..cc646946 100644 --- a/lib/Makefile.local +++ b/lib/Makefile.local @@ -66,7 +66,8 @@ libnotmuch_cxx_srcs = \ $(dir)/init.cc \ $(dir)/parse-sexp.cc\ $(dir)/sexp-fp.cc \ - $(dir)/lastmod-fp.cc + $(dir)/lastmod-fp.cc\ + $(dir)/count-query.cc libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o) diff --git a/lib/count-query.cc b/lib/count-query.cc new file mode 100644 index ..5d258880 --- /dev/null +++ b/lib/count-query.cc @@ -0,0 +1,62 @@ +/* count-query.cc - generate queries for terms on few / many messages. + * + * This file is part of notmuch. + * + * Copyright © 2023 David Bremner + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see https://www.gnu.org/licenses/ . + * + * Author: David Bremner + */ + +#include "database-private.h" + +notmuch_status_t +_notmuch_count_strings_to_query (notmuch_database_t *notmuch, std::string field, +const std::string , const std::string , +Xapian::Query , std::string ) +{ + +long from_idx = 0, to_idx = LONG_MAX; +std::string term_prefix = _find_prefix (field.c_str ()); +std::vector terms; + +if (! from.empty ()) { + try { + from_idx = std::stol(from); + } catch (std::logic_error ) { + msg = "bad 'from' count: '" + from + "'"; + return NOTMUCH_STATUS_BAD_QUERY_SYNTAX; + } +} + +if (! to.empty ()) { + try { + to_idx = std::stod(to); + } catch (std::logic_error ) { + msg = "bad 'to' count: '" + to + "'"; + return NOTMUCH_STATUS_BAD_QUERY_SYNTAX; + } +} + +for (Xapian::TermIterator it = notmuch->xapian_db->allterms_begin (term_prefix); +it != notmuch->xapian_db->allterms_end (); ++it) { + Xapian::doccount freq = it.get_termfreq(); + if (from_idx <= freq && freq <= to_idx) + terms.push_back (*it); +} + +output = Xapian::Query (Xapian::Query::OP_OR, terms.begin (), terms.end ()); +return NOTMUCH_STATUS_SUCCESS; +} diff --git a/lib/database-private.h b/lib/database-private.h index b9be4e22..ba96a93c 100644 --- a/lib/database-private.h +++ b/lib/database-private.h @@ -387,5 +387,11 @@ notmuch_status_t _notmuch_lastmod_strings_to_query (notmuch_database_t *notmuch, const std::string , const std::string , Xapian::Query , std::string ); + +/* count-query.cc */ +notmuch_status_t +_notmuch_count_strings_to_query (notmuch_database_t *notmuch, std::string field, +const std::string , const std::string , +Xapian::Query , std::string ); #endif #endif -- 2.39.1 ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Proof of concept for counting messages in thread
So for this only supports counting messages in threads, and the sexp based query parser. It seems useful to expand it to other fields (from, e.g.). I'm not sure how motivated I am to shim this into the infix query parser, but we will see how it goes. ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
[PATCH 2/2] WIP: support thread count queries
--- lib/parse-sexp.cc | 35 --- test/T081-sexpr-search.sh | 6 ++ 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/lib/parse-sexp.cc b/lib/parse-sexp.cc index 9cadbc13..1faa9023 100644 --- a/lib/parse-sexp.cc +++ b/lib/parse-sexp.cc @@ -34,6 +34,8 @@ typedef enum { SEXP_FLAG_ORPHAN = 1 << 8, SEXP_FLAG_RANGE= 1 << 9, SEXP_FLAG_PATHNAME = 1 << 10, +SEXP_FLAG_COUNT= 1 << 11, +SEXP_FLAG_MODIFIER = 1 << 12, } _sexp_flag_t; /* @@ -70,6 +72,8 @@ static _sexp_prefix_t prefixes[] = SEXP_FLAG_FIELD }, { "date", Xapian::Query::OP_INVALID, Xapian::Query::MatchAll, SEXP_FLAG_RANGE }, +{ "count", Xapian::Query::OP_INVALID, Xapian::Query::MatchAll, + SEXP_FLAG_RANGE | SEXP_FLAG_MODIFIER }, { "from", Xapian::Query::OP_AND, Xapian::Query::MatchAll, SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX | SEXP_FLAG_EXPAND }, { "folder", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, @@ -113,7 +117,8 @@ static _sexp_prefix_t prefixes[] = { "tag",Xapian::Query::OP_AND, Xapian::Query::MatchAll, SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX | SEXP_FLAG_EXPAND }, { "thread", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX | SEXP_FLAG_EXPAND }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX | + SEXP_FLAG_EXPAND | SEXP_FLAG_COUNT }, { "to", Xapian::Query::OP_AND, Xapian::Query::MatchAll, SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD | SEXP_FLAG_EXPAND }, { } @@ -513,6 +518,7 @@ _sexp_expand_param (notmuch_database_t *notmuch, const _sexp_prefix_t *parent, static notmuch_status_t _sexp_parse_range (notmuch_database_t *notmuch, const _sexp_prefix_t *prefix, + const _sexp_prefix_t *parent, const sexp_t *sx, Xapian::Query ) { const char *from, *to; @@ -552,6 +558,27 @@ _sexp_parse_range (notmuch_database_t *notmuch, const _sexp_prefix_t *prefix, to = ""; } +if (strcmp (prefix->name, "count") == 0) { + notmuch_status_t status; + if (! parent) { + _notmuch_database_log (notmuch, "illegal '%s' outside field\n", + prefix->name); + return NOTMUCH_STATUS_BAD_QUERY_SYNTAX; + } + if (! (parent->flags & SEXP_FLAG_COUNT)) { + _notmuch_database_log (notmuch, "'%s' not supported in field '%s'\n", + prefix->name, parent->name); + return NOTMUCH_STATUS_BAD_QUERY_SYNTAX; + } + + status = _notmuch_count_strings_to_query (notmuch, parent->name, from, to, output, msg); + if (status) { + if (! msg.empty ()) + _notmuch_database_log (notmuch, "%s\n", msg.c_str ()); + } + return status; +} + if (strcmp (prefix->name, "date") == 0) { notmuch_status_t status; status = _notmuch_date_strings_to_query (NOTMUCH_VALUE_TIMESTAMP, from, to, output, msg); @@ -654,7 +681,9 @@ _sexp_to_xapian_query (notmuch_database_t *notmuch, const _sexp_prefix_t *parent for (_sexp_prefix_t *prefix = prefixes; prefix && prefix->name; prefix++) { if (strcmp (prefix->name, sx->list->val) == 0) { - if (prefix->flags & (SEXP_FLAG_FIELD | SEXP_FLAG_RANGE)) { + if ((prefix->flags & (SEXP_FLAG_FIELD)) || + ((prefix->flags & SEXP_FLAG_RANGE) && +! (prefix->flags & SEXP_FLAG_MODIFIER))) { if (parent) { _notmuch_database_log (notmuch, "nested field: '%s' inside '%s'\n", prefix->name, parent->name); @@ -677,7 +706,7 @@ _sexp_to_xapian_query (notmuch_database_t *notmuch, const _sexp_prefix_t *parent } if (prefix->flags & SEXP_FLAG_RANGE) - return _sexp_parse_range (notmuch, prefix, sx->list->next, output); + return _sexp_parse_range (notmuch, prefix, parent, sx->list->next, output); if (strcmp (prefix->name, "infix") == 0) { return _sexp_parse_infix (notmuch, sx->list->next, output); diff --git a/test/T081-sexpr-search.sh b/test/T081-sexpr-search.sh index 0c7db9c2..2013fa5c 100755 --- a/test/T081-sexpr-search.sh +++ b/test/T081-sexpr-search.sh @@ -1318,5 +1318,11 @@ notmuch search subject:notmuch or List:notmuch | notmuch_search_sanitize > EXPEC notmuch search --query=sexp '(About notmuch)' | notmuch_search_sanitize > OUTPUT test_expect_equal_file EXPECTED OUTPUT +test_begin_subtest "threads with one message" +notmuch search --query=sexp '(and (from gusarov) (thread (count 1)))' | notmuch_search_sanitize > OUTPUT +cat