Re: [PATCH 1/1] lib: replace some uses of Query::MatchAll with a thread-safe alternative

2023-03-31 Thread David Bremner
Kevin Boulain  writes:

> This replaces two instances of Xapian::Query::MatchAll with the
> equivalent but thread-safe alternative Xapian::Query(std::string()).
> Xapian::Query::MatchAll maintains an internal pointer to a refcounted
> Xapian::Internal::QueryTerm.
>

Applied to master, thanks, and sorry for the delay.

By the way I see sfsexp has merged your related PR last week.

d
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


[PATCH 1/1] lib: replace some uses of Query::MatchAll with a thread-safe alternative

2023-03-02 Thread Kevin Boulain
This replaces two instances of Xapian::Query::MatchAll with the
equivalent but thread-safe alternative Xapian::Query(std::string()).
Xapian::Query::MatchAll maintains an internal pointer to a refcounted
Xapian::Internal::QueryTerm.

None of this is thread-safe but that wouldn't be an issue if
Xapian::Query::MatchAll wasn't static. Because it's static, the
refcounting goes awry when Notmuch is called from multiple threads.
This is actually documented by Xapian:
https://github.com/xapian/xapian/blob/4715de3a9fcee741587439dc3cc1d2ff01ffeaf2/xapian-core/include/xapian/query.h#L65

While static, Xapian::Query::MatchNothing is safe because it doesn't
maintain an internal object and as such, doesn't use references.

Two best-effort tests making use of TSan were added to showcase the
issue (I couldn't figure out a way to deterministically reproduce it
without making an unmaintainable mess).

First, when two databases are created in parallel, a query that uses
Xapian::Query::MatchAll is made (lib/query.cc), resulting in the
following backtrace on a segfault:
  #0  0x776822af in Xapian::Query::get_terms_begin 
(this=0x7fffe80137f0) at api/query.cc:141
  #1  0x77f933f5 in _notmuch_query_cache_terms (query=0x7fffe80137c0) 
at lib/query.cc:176
  #2  0x77f93784 in _notmuch_query_ensure_parsed_xapian 
(query=0x7fffe80137c0) at lib/query.cc:225
  #3  0x77f9381a in _notmuch_query_ensure_parsed (query=0x7fffe80137c0) 
at lib/query.cc:260
  #4  0x77f93bfe in _notmuch_query_search_documents 
(query=0x7fffe80137c0, type=0x77fa9b1e "mail", out=0x7666da18) at 
lib/query.cc:361
  #5  0x77f93ba4 in notmuch_query_search_messages 
(query=0x7fffe80137c0, out=0x7666da18) at lib/query.cc:349
  #6  0x77f83d98 in notmuch_database_upgrade (notmuch=0x7fffe8000bd0, 
progress_notify=0x0, closure=0x0) at lib/database.cc:934
  #7  0x77fa110f in notmuch_database_create_with_config 
(database_path=0x7666dcb0 "/tmp/notmuch.MZ2AGr", config_path=0x77faab3c 
"", profile=0x0, database=0x0, status_string=0x7666dc90) at lib/open.cc:754
  #8  0x77fa0d6f in notmuch_database_create_verbose 
(path=0x7666dcb0 "/tmp/notmuch.MZ2AGr", database=0x0, 
status_string=0x7666dc90) at lib/open.cc:653
  #9  0x77fa0ceb in notmuch_database_create (path=0x7666dcb0 
"/tmp/notmuch.MZ2AGr", database=0x0) at lib/open.cc:637
  ...

Second, some queries would make use of Xapian::Query::MatchAll
(lib/regexp-fields.cc), resulting in the following backtrace on a
segfault:
  #0  0x7f629828b690 in Xapian::Internal::QueryBranch::gather_terms 
(this=0x7f628800def0, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1245
  #1  0x7f629828c260 in Xapian::Internal::QueryScaleWeight::gather_terms 
(this=0x7f628800df70, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1434
  #2  0x7f629828b69f in Xapian::Internal::QueryBranch::gather_terms 
(this=0x7f628800dd90, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1245
  #3  0x7f6298282571 in Xapian::Query::get_unique_terms_begin 
(this=0x7f628800dcd8) at api/query.cc:166
  #4  0x7f629841a59b in Xapian::Weight::Internal::accumulate_stats 
(this=0x7f628800dca0, subdb=..., rset=...) at weight/weightinternal.cc:86
  #5  0x7f62983c15ba in LocalSubMatch::prepare_match (this=0x7f628800df20, 
nowait=true, total_stats=...) at matcher/localsubmatch.cc:172
  #6  0x7f62983c8fcc in prepare_sub_matches (leaves=std::vector of length 
1, capacity 1 = {...}, stats=...) at matcher/multimatch.cc:237
  #7  0x7f62983c98a3 in MultiMatch::MultiMatch (this=0x7f629726d9a0, 
db_=..., query_=..., qlen=3, omrset=0x0, collapse_max_=0, 
collapse_key_=4294967295, percent_cutoff_=0, weight_cutoff_=0, 
order_=Xapian::Enquire::ASCENDING, sort_key_=0, 
sort_by_=Xapian::Enquire::Internal::VAL, sort_value_forward_=true, 
time_limit_=0, stats=..., weight_=0x7f6288008d50, matchspies_=std::vector of 
length 0, capacity 0, have_sorter=false, have_mdecider=false) at 
matcher/multimatch.cc:353
  #8  0x7f629826fcba in Xapian::Enquire::Internal::get_mset 
(this=0x7f628800e0b0, first=0, maxitems=0, check_at_least=0, rset=0x0, 
mdecider=0x0) at api/omenquire.cc:569
  #9  0x7f629827181c in Xapian::Enquire::get_mset (this=0x7f629726db80, 
first=0, maxitems=0, check_at_least=0, rset=0x0, mdecider=0x0) at 
api/omenquire.cc:937
  #10 0x7f6298be529a in _notmuch_query_search_documents 
(query=0x7f6288009750, type=0x7f6298bfaafe "mail", out=0x7f629726dcc0) at 
lib/query.cc:447
  #11 0x7f6298be4ae8 in notmuch_query_search_messages 
(query=0x7f6288009750, out=0x7f629726dcc0) at lib/query.cc:349
  ...

Printing Xapian::Query::MatchAll->internal.px->_refs in these
circumstances can help quickly identifying this scenario.

This is motivated by some test frameworks (like Rust's Cargo) that
runs unit tests in parallel and would easily encounter this issue,
unless client code gates every call to Notmuch behind a lock.

This is what can be