Re: Proof of concept for counting messages in thread

2023-02-13 Thread David Bremner
Michael J Gruber  writes:

> That is really weird:
> ```
> xapian-delve -t G00021229 .
> Posting List for term 'G00021229' (termfreq 115, collfreq 0,
> wdf_max 0): 146259 ...
> ```
> with 115 record numbers, all different.
> Doing `xapian-delve -1r` for each of them and grepping for the G-lines
> gives 115 times that correct thread id.
> Grepping for the Q-lines and notmuch-searching for the message ids
> gives only 5 results (the expected ones). Apparantly, there are bogus
> mail records which that thread points to.

1) Do those "bogus" records have a "Tghost" term? That would be for
messages that are known via references, but not actually in the local
database. This is a bug / feature of the current implementation, it
counts all messages known, whether or not local copies exist.

2) Do they have more than one G term? That suggests a bug somewhere. We
actually have a test in the test suite [1] for that, but of course that is
with a simple artificial database. 

[1]: in T670-duplicate-mid.sh:

db=$HOME/.local/share/notmuch/default/xapian
for doc in $(xapian-delve -1 -t '' "$db" | grep '^[1-9]'); do
xapian-delve -1 -r "$doc" "$db" | grep -c '^G'
done > OUTPUT.raw
sort -u < OUTPUT.raw > OUTPUT
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: Proof of concept for counting messages in thread

2023-02-13 Thread Michael J Gruber
Am Mo., 13. Feb. 2023 um 21:23 Uhr schrieb David Bremner :
>
> Michael J Gruber  writes:
> >
> > It has 5, as confirmed by the search output and that of `notmuch
> > count`. But it is matched by `count 115`.
> > `xapian-check` is happy. (There used to be some issue with additional
> > thread entries at some point.)
> >
> > Michael
>
> A simple test to try is
>
> % xapian-delve -t G00021229 \
>   ~/.local/share/notmuch/default/xapian
>
> adjusting your database path as needed.
>
> If that says "termfreq 115", then something is broken (or at least
> confusing) about your database (possibly related to the previous issues
> with threading). In that case I'm curious if there are 115 distinct
> record numbers.  You can find all of the thread-ids attached to a given
> message with
>
> % xapian-delve -1r 267585 ~/.local/share/notmuch/default/xapian | grep ^G
>
> where 267585 is an example record number in my database.

That is really weird:
```
xapian-delve -t G00021229 .
Posting List for term 'G00021229' (termfreq 115, collfreq 0,
wdf_max 0): 146259 ...
```
with 115 record numbers, all different.
Doing `xapian-delve -1r` for each of them and grepping for the G-lines
gives 115 times that correct thread id.
Grepping for the Q-lines and notmuch-searching for the message ids
gives only 5 results (the expected ones). Apparantly, there are bogus
mail records which that thread points to.
I guess I should recreate the db, if I only knew how lieer deals with
a reindexed mail store ... (The thread and the 5 message sit in an
mbsynced folder, but lieer syncs other folders with that same db).

Michael
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: Proof of concept for counting messages in thread

2023-02-13 Thread David Bremner
Michael J Gruber  writes:
>
> It has 5, as confirmed by the search output and that of `notmuch
> count`. But it is matched by `count 115`.
> `xapian-check` is happy. (There used to be some issue with additional
> thread entries at some point.)
>
> Michael

A simple test to try is

% xapian-delve -t G00021229 \
  ~/.local/share/notmuch/default/xapian

adjusting your database path as needed.

If that says "termfreq 115", then something is broken (or at least
confusing) about your database (possibly related to the previous issues
with threading). In that case I'm curious if there are 115 distinct
record numbers.  You can find all of the thread-ids attached to a given
message with

% xapian-delve -1r 267585 ~/.local/share/notmuch/default/xapian | grep ^G

where 267585 is an example record number in my database.
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: Proof of concept for counting messages in thread

2023-02-13 Thread Michael J Gruber
Am Mo., 13. Feb. 2023 um 17:32 Uhr schrieb David Bremner :
>
> Michael J Gruber  writes:
>
> > I am getting a few surprising matches, e.g.
> > ```
> > notmuch search  --query=sexp '(thread (count 115)))'
> > thread:00021229   2021-05-17 [5/5] Michael J Gruber ... redacted
> > notmuch count --exclude=false thread:00021229
> > 5
> > ```
> > It could be some database issues, of course. Or me misunderstanding 
> > something :)
>
> Hmm. I don't see any strange matches for that particular query, just a
> thread that actually has 115 messages. But there could also be bugs of
> course.  Does xapin-check complain about your database?

It has 5, as confirmed by the search output and that of `notmuch
count`. But it is matched by `count 115`.
`xapian-check` is happy. (There used to be some issue with additional
thread entries at some point.)

Michael
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: Proof of concept for counting messages in thread

2023-02-13 Thread David Bremner
Michael J Gruber  writes:

> I am getting a few surprising matches, e.g.
> ```
> notmuch search  --query=sexp '(thread (count 115)))'
> thread:00021229   2021-05-17 [5/5] Michael J Gruber ... redacted
> notmuch count --exclude=false thread:00021229
> 5
> ```
> It could be some database issues, of course. Or me misunderstanding something 
> :)

Hmm. I don't see any strange matches for that particular query, just a
thread that actually has 115 messages. But there could also be bugs of
course.  Does xapin-check complain about your database?

>
> Patch 1/2 is crlf garbled, by the way. Applies cleanly after removing
> the extra ^Ms.

Hmm. Probably because of Content-Transfer-Encoding: 8bit

I have a direct mailed copy that didn't go through mailman, and that
looks OK. 

>
> Michael
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: Proof of concept for counting messages in thread

2023-02-13 Thread Michael J Gruber
Am Mo., 13. Feb. 2023 um 13:26 Uhr schrieb David Bremner :
>
> So for this only supports counting messages in threads, and the sexp
> based query parser. It seems useful to expand it to other fields
> (from, e.g.). I'm not sure how motivated I am to shim this into the
> infix query parser, but we will see how it goes.

This certainly looks interesting, and not easy to get by scripting
around the existing commands. It is kinda special, so having it in
sexp only seems okay.

I am getting a few surprising matches, e.g.
```
notmuch search  --query=sexp '(thread (count 115)))'
thread:00021229   2021-05-17 [5/5] Michael J Gruber ... redacted
notmuch count --exclude=false thread:00021229
5
```
It could be some database issues, of course. Or me misunderstanding something :)

Patch 1/2 is crlf garbled, by the way. Applies cleanly after removing
the extra ^Ms.

Michael
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


[PATCH 1/2] WIP/lib: add count query backend

2023-02-13 Thread David Bremner
---
 lib/Makefile.local |  3 +-
 lib/count-query.cc | 62 ++
 lib/database-private.h |  6 
 3 files changed, 70 insertions(+), 1 deletion(-)
 create mode 100644 lib/count-query.cc

diff --git a/lib/Makefile.local b/lib/Makefile.local
index 4e766305..cc646946 100644
--- a/lib/Makefile.local
+++ b/lib/Makefile.local
@@ -66,7 +66,8 @@ libnotmuch_cxx_srcs = \
$(dir)/init.cc  \
$(dir)/parse-sexp.cc\
$(dir)/sexp-fp.cc   \
-   $(dir)/lastmod-fp.cc
+   $(dir)/lastmod-fp.cc\
+   $(dir)/count-query.cc
 
 libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o)
 
diff --git a/lib/count-query.cc b/lib/count-query.cc
new file mode 100644
index ..5d258880
--- /dev/null
+++ b/lib/count-query.cc
@@ -0,0 +1,62 @@
+/* count-query.cc - generate queries for terms on few / many messages.
+ *
+ * This file is part of notmuch.
+ *
+ * Copyright © 2023 David Bremner
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see https://www.gnu.org/licenses/ .
+ *
+ * Author: David Bremner 
+ */
+
+#include "database-private.h"
+
+notmuch_status_t
+_notmuch_count_strings_to_query (notmuch_database_t *notmuch, std::string 
field,
+const std::string , const std::string ,
+Xapian::Query , std::string )
+{
+
+long from_idx = 0, to_idx = LONG_MAX;
+std::string term_prefix = _find_prefix (field.c_str ());
+std::vector terms;
+
+if (! from.empty ()) {
+   try {
+   from_idx = std::stol(from);
+   } catch (std::logic_error ) {
+   msg = "bad 'from' count: '" + from + "'";
+   return NOTMUCH_STATUS_BAD_QUERY_SYNTAX;
+   }
+}
+
+if (! to.empty ()) {
+   try {
+   to_idx = std::stod(to);
+   } catch (std::logic_error ) {
+   msg = "bad 'to' count: '" + to + "'";
+   return NOTMUCH_STATUS_BAD_QUERY_SYNTAX;
+   }
+}
+
+for (Xapian::TermIterator it = notmuch->xapian_db->allterms_begin 
(term_prefix);
+it != notmuch->xapian_db->allterms_end (); ++it) {
+   Xapian::doccount freq = it.get_termfreq();
+   if (from_idx <= freq && freq <= to_idx)
+   terms.push_back (*it);
+}
+
+output = Xapian::Query (Xapian::Query::OP_OR, terms.begin (), terms.end 
());
+return NOTMUCH_STATUS_SUCCESS;
+}
diff --git a/lib/database-private.h b/lib/database-private.h
index b9be4e22..ba96a93c 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -387,5 +387,11 @@ notmuch_status_t
 _notmuch_lastmod_strings_to_query (notmuch_database_t *notmuch,
   const std::string , const std::string 
,
   Xapian::Query , std::string );
+
+/* count-query.cc */
+notmuch_status_t
+_notmuch_count_strings_to_query (notmuch_database_t *notmuch, std::string 
field,
+const std::string , const std::string ,
+Xapian::Query , std::string );
 #endif
 #endif
-- 
2.39.1

___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Proof of concept for counting messages in thread

2023-02-13 Thread David Bremner
So for this only supports counting messages in threads, and the sexp
based query parser. It seems useful to expand it to other fields
(from, e.g.). I'm not sure how motivated I am to shim this into the
infix query parser, but we will see how it goes.


___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


[PATCH 2/2] WIP: support thread count queries

2023-02-13 Thread David Bremner
---
 lib/parse-sexp.cc | 35 ---
 test/T081-sexpr-search.sh |  6 ++
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/lib/parse-sexp.cc b/lib/parse-sexp.cc
index 9cadbc13..1faa9023 100644
--- a/lib/parse-sexp.cc
+++ b/lib/parse-sexp.cc
@@ -34,6 +34,8 @@ typedef enum {
 SEXP_FLAG_ORPHAN   = 1 << 8,
 SEXP_FLAG_RANGE= 1 << 9,
 SEXP_FLAG_PATHNAME = 1 << 10,
+SEXP_FLAG_COUNT= 1 << 11,
+SEXP_FLAG_MODIFIER = 1 << 12,
 } _sexp_flag_t;
 
 /*
@@ -70,6 +72,8 @@ static _sexp_prefix_t prefixes[] =
   SEXP_FLAG_FIELD },
 { "date",   Xapian::Query::OP_INVALID,  
Xapian::Query::MatchAll,
   SEXP_FLAG_RANGE },
+{ "count",  Xapian::Query::OP_INVALID,  
Xapian::Query::MatchAll,
+  SEXP_FLAG_RANGE | SEXP_FLAG_MODIFIER },
 { "from",   Xapian::Query::OP_AND,  
Xapian::Query::MatchAll,
   SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX | 
SEXP_FLAG_EXPAND },
 { "folder", Xapian::Query::OP_OR,   
Xapian::Query::MatchNothing,
@@ -113,7 +117,8 @@ static _sexp_prefix_t prefixes[] =
 { "tag",Xapian::Query::OP_AND,  
Xapian::Query::MatchAll,
   SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | 
SEXP_FLAG_REGEX | SEXP_FLAG_EXPAND },
 { "thread", Xapian::Query::OP_OR,   
Xapian::Query::MatchNothing,
-  SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | 
SEXP_FLAG_REGEX | SEXP_FLAG_EXPAND },
+  SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | 
SEXP_FLAG_REGEX |
+  SEXP_FLAG_EXPAND | SEXP_FLAG_COUNT },
 { "to", Xapian::Query::OP_AND,  
Xapian::Query::MatchAll,
   SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD | SEXP_FLAG_EXPAND },
 { }
@@ -513,6 +518,7 @@ _sexp_expand_param (notmuch_database_t *notmuch, const 
_sexp_prefix_t *parent,
 
 static notmuch_status_t
 _sexp_parse_range (notmuch_database_t *notmuch,  const _sexp_prefix_t *prefix,
+  const _sexp_prefix_t *parent,
   const sexp_t *sx, Xapian::Query )
 {
 const char *from, *to;
@@ -552,6 +558,27 @@ _sexp_parse_range (notmuch_database_t *notmuch,  const 
_sexp_prefix_t *prefix,
to = "";
 }
 
+if (strcmp (prefix->name, "count") == 0) {
+   notmuch_status_t status;
+   if (! parent) {
+   _notmuch_database_log (notmuch, "illegal '%s' outside field\n",
+  prefix->name);
+   return NOTMUCH_STATUS_BAD_QUERY_SYNTAX;
+   }
+   if (! (parent->flags & SEXP_FLAG_COUNT)) {
+   _notmuch_database_log (notmuch, "'%s' not supported in field 
'%s'\n",
+  prefix->name, parent->name);
+   return NOTMUCH_STATUS_BAD_QUERY_SYNTAX;
+   }
+
+   status = _notmuch_count_strings_to_query (notmuch, parent->name, from, 
to, output, msg);
+   if (status) {
+   if (! msg.empty ())
+   _notmuch_database_log (notmuch, "%s\n", msg.c_str ());
+   }
+   return status;
+}
+
 if (strcmp (prefix->name, "date") == 0) {
notmuch_status_t status;
status = _notmuch_date_strings_to_query (NOTMUCH_VALUE_TIMESTAMP, from, 
to, output, msg);
@@ -654,7 +681,9 @@ _sexp_to_xapian_query (notmuch_database_t *notmuch, const 
_sexp_prefix_t *parent
 
 for (_sexp_prefix_t *prefix = prefixes; prefix && prefix->name; prefix++) {
if (strcmp (prefix->name, sx->list->val) == 0) {
-   if (prefix->flags & (SEXP_FLAG_FIELD | SEXP_FLAG_RANGE)) {
+   if ((prefix->flags & (SEXP_FLAG_FIELD)) ||
+   ((prefix->flags & SEXP_FLAG_RANGE) &&
+! (prefix->flags & SEXP_FLAG_MODIFIER))) {
if (parent) {
_notmuch_database_log (notmuch, "nested field: '%s' inside 
'%s'\n",
   prefix->name, parent->name);
@@ -677,7 +706,7 @@ _sexp_to_xapian_query (notmuch_database_t *notmuch, const 
_sexp_prefix_t *parent
}
 
if (prefix->flags & SEXP_FLAG_RANGE)
-   return _sexp_parse_range (notmuch, prefix, sx->list->next, 
output);
+   return _sexp_parse_range (notmuch, prefix, parent, 
sx->list->next, output);
 
if (strcmp (prefix->name, "infix") == 0) {
return _sexp_parse_infix (notmuch, sx->list->next, output);
diff --git a/test/T081-sexpr-search.sh b/test/T081-sexpr-search.sh
index 0c7db9c2..2013fa5c 100755
--- a/test/T081-sexpr-search.sh
+++ b/test/T081-sexpr-search.sh
@@ -1318,5 +1318,11 @@ notmuch search subject:notmuch or List:notmuch | 
notmuch_search_sanitize > EXPEC
 notmuch search --query=sexp '(About notmuch)' | notmuch_search_sanitize > 
OUTPUT
 test_expect_equal_file EXPECTED OUTPUT
 
+test_begin_subtest "threads with one message"
+notmuch search --query=sexp '(and (from gusarov) (thread (count 1)))' | 
notmuch_search_sanitize > OUTPUT
+cat