Re: [PATCH 1/2] cli: looking for config file in $XDG_CONFIG_HOME
Daniel Kahn Gillmorwrites: > > * when we observe a config file, we could walk each option present in >it. For each option: > > a) if that option is not present in the database, copy it into the >database. > > b) if that option is present in the database, and it matches the >option in the config file, ignore > > c) if that option is present in the database but does not match the >config file, use the version in the config file but warn the user >that they have a conflict they should probably resolve soon. I guess this whole discussion is about the CLI, so in principle that's OK, but for existing library code there's no real way to use the config file value for config values that are stored in the database: we just don't pass that information in to the API. What's (relatively) easy is to have the config file reader used by the CLI check for conflicts with the database config and report those. We could also give people a flag (or something) so that the database version of the config is overwritten. Really, writing the database config from the text config file seems relatively sane to me. It's editing that file via computer that gives me the heebie-jeebies. > So i think it would be a shame to have an additional layer of confusion > added by having two different deprecated configuration files. So i lean > against adopting this change. I'd much rather see work on phasing out > the configfile. OK, noted. I guess I would imagine any conflict resolution to be against the in-memory struct read from disk, so somewhat orthogonal to where it is read from. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: multilingual notmuch (and Content-Language)
On Sun, 18 Mar 2018, Daniel Kahn Gillmorwrote: > * if we know our index expects english, and we have a message part that >*is not* english (e.g. Content-Language: es), we could avoid indexing >that part. Why would we do that? Search mostly works just fine for non-English languages, it's just that the *stemming* is not right. > what do you think? what ideas are missing from the branstorm above? I'd > love to hear from people with multilingual mailboxes about how we might > be able to make notmuch work better for them. With my limited understanding of this, stemming happens both at indexing and searching. Basically at indexing, the term generator indexes both the full and the stemmed version of words. I'm wondering if we could look at Content-Language (and missing that, heuristics), and (if the user so desires) use multiple term generators with different stemmers on a per document basis. Or, use non-stemming indexing for unidentified or unsupported languages. How far would that take us? Then, perhaps, we could also perform language specific queries? I don't know how feasible that is, or if it would require Xapian changes. BR, Jani. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: multilingual notmuch (and Content-Language)
Daniel Kahn Gillmorwrites: > AIUI, xapian is pretty much committed to being a single-language > indexer. But i just wanted to point out that it's possible that we > could be smarter about this in notmuch, and wanted to make a space for > possible design discussion. > More precisely, it uses a single _stemmer_ when generating terms and when parsing queries. Nothing says that these have to correspond to a single human language. The stemmer is also configured at runtime, so it could in principle be per database configurable. I mention the possibility of a custom stemmer because that also seems like a natural place to put things like unicode normalization and accent removal. d ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 1/1] Fix typos as found by codespell
On Sun 2018-03-18 03:30:25 +0100, Georg Faerber wrote: > Signed-off-by: Georg Faerber> --- > NEWS | 4 ++-- > bindings/python/docs/source/filesystem.rst | 2 +- > contrib/go/src/notmuch/notmuch.go | 2 +- > debian/changelog | 4 ++-- > emacs/notmuch-mua.el | 2 +- > test/T190-multipart.sh | 2 +- > test/T410-argument-parsing.sh | 2 +- > vim/README | 2 +- > 8 files changed, 10 insertions(+), 10 deletions(-) this patch looks good to me. --dkg ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
multilingual notmuch (and Content-Language)
https://tools.ietf.org/html/rfc3282 describes a Content-Language: header. https://tools.ietf.org/html/rfc8255 describes a multipart/multilingual Content-Type. notmuch currently uses xapian with a hard-coded English stemmer which works great for me as a monolingual American, but limits the applicability of notmuch to Anglophiles (people who speak English). That makes me sad. AIUI, xapian is pretty much committed to being a single-language indexer. But i just wanted to point out that it's possible that we could be smarter about this in notmuch, and wanted to make a space for possible design discussion. a few concrete suggestions (intended as brainstorming, feedback welcome): * if we know our index expects english, and we have a message part that *is not* english (e.g. Content-Language: es), we could avoid indexing that part. * during indexing, we could add a property to each message when we discover a Content-Language header. this would let you do something like "notmuch search property:lang=es" to find all messages explicitly tagged as spanish. * (pretty crazy) If we're willing to search in another language we could add an additional xapian database configured that language, and we could index identified parts in that language. * for text parts without a Content-Language: header, we could do some concrete heuristics to guess the language. For example, choose the 1000 most popular words for each language we might know about, and look for their presence in the text. Choose the language that is most heavily represented, and store it in the index as a property. this could be combined with the suggestions above. what do you think? what ideas are missing from the branstorm above? I'd love to hear from people with multilingual mailboxes about how we might be able to make notmuch work better for them. Regards, --dkg signature.asc Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: Permissions of files created by notmuch
On Sun 2018-03-18 04:30:06 +0100, Georg Faerber wrote: > I'm using notmuch 0.26-1+b2 out of Debian unstable. > The files created inside .notmuch/xapian by notmuch are group and world > readable. Is this on purpose? This seems quite suboptimal, especially > if one is using the recently introduced cleartext indexing feature.. is your mailbox itself world-readable? What is your umask? in general, i'd expect notmuch to follow umask like any other unix tool. if we wanted it to be more restrictive, maybe that's a separate use case. See also discussion at id:20180209041058.4037-1-...@fifthhorseman.net around whether "notmuch insert" and other tools should produce world-readable files by default. curious to hear what you think is the right choice here. --dkg signature.asc Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: emacs-notmuch: A Xapian exception occurred parsing query
Olly Bettswrites: > On 2018-02-07, David Bremner wrote: >> The underlying issue is that * is parsed (simplistically) by notmuch >> before passing to Xapian, so only works if it is the entire query. >> >> For cases like you report, where the user has not entered '*', but >> rather it is contained in some generated query string, we could fix the >> problem by adding a prefix like "special:*". > > If you're generating the query string, you could presumably just > generate « tag:flagged » for this case. Yes, that should in principle be a Simple Matter of Programming (TM). But this doesn't affect the underlying infelicity that "*" is a valid notmuch query, while "* and tag:foo" is not > > Though it's generally better not to try to generate a string to parse, > but instead to parse any part(s) the user actually wrote and combine > the resulting Xapian::Query objects with directly constructed objects > for other filters, etc. > That's a bit tougher here, since the emacs interface is calling the notmuch CLI with query strings. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: Crash with Python bindings
Daniel Kahn Gillmorwrites: > On Fri 2018-03-16 19:30:37 +0100, Floris Bruynooghe wrote: >> If someone can hook pytest runs with various python versions into the >> notmuch test suit I'd be very much obliged and probably have another go >> at this as it's still an interesting problem and gives a nice way >> forward. > > I don't really know what this request means -- so maybe that means that > i'm not the right person for the task, which is fine. > > but it's also possible that the right person for the task *also* doesn't > know what you're asking for, so if you could elaborate a bit further > i think that would be super helpful :) Fair enough :) Here a somewhat more long-form version of this: Before even attempting to refactor the existing bindings to use cffi as a backend instead off ctypes and/or adding the changes needed to track the lifetime of objects correctly I would like to be able to write full unitttest-level tests for the bindings to be able to guarantee that no user-level APIs are broken. In my version of the bindings I did this the traditional Python way: using pytest for writing unittest and using tox to invoke the tests for the various supported versions of python. One of the feedback items I got from the patch I sent last time was that the project would be reluctant to adopt this and would like to avoid virtualenv and pip with their behaviour of downloading things over the network. Instead wishing for it to use a system python which should have the available tools already installed (i.e. pytest). And all this just integrated in the existing test suite. So my last attempt at this looks like I made a test/T391-pytest.sh file with the idea of running a subtest for each python version, with the subtest being a ``pythonX.Y -m pytest bindings/python/tests/`` so it'd run the entire test. To be nice this also needs to be hooked up so that the subtests get skipped when a python version is not available, or is missing pytest itself. So while trying to figure this out is where I got distracted last time and started working more on other things. Kind Regards, Floris ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch