Re: [PATCH 1/2] cli: looking for config file in $XDG_CONFIG_HOME

2018-03-18 Thread David Bremner
Daniel Kahn Gillmor  writes:

>
>  * when we observe a config file, we could walk each option present in
>it.  For each option:
>
> a) if that option is not present in the database, copy it into the
>database.
>
> b) if that option is present in the database, and it matches the
>option in the config file, ignore
>
> c) if that option is present in the database but does not match the
>config file, use the version in the config file but warn the user
>that they have a conflict they should probably resolve soon.

I guess this whole discussion is about the CLI, so in principle that's
OK, but for existing library code there's no real way to use the config
file value for config values that are stored in the database: we just
don't pass that information in to the API. What's (relatively) easy is
to have the config file reader used by the CLI check for conflicts with
the database config and report those. We could also give people a flag
(or something) so that the database version of the config is
overwritten.

Really, writing the database config from the text config file seems
relatively sane to me. It's editing that file via computer that gives me
the heebie-jeebies.

> So i think it would be a shame to have an additional layer of confusion
> added by having two different deprecated configuration files.  So i lean
> against adopting this change. I'd much rather see work on phasing out
> the configfile.

OK, noted. I guess I would imagine any conflict resolution to be against
the in-memory struct read from disk, so somewhat orthogonal to where it
is read from.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: multilingual notmuch (and Content-Language)

2018-03-18 Thread Jani Nikula
On Sun, 18 Mar 2018, Daniel Kahn Gillmor  wrote:
>  * if we know our index expects english, and we have a message part that
>*is not* english (e.g. Content-Language: es), we could avoid indexing
>that part.

Why would we do that? Search mostly works just fine for non-English
languages, it's just that the *stemming* is not right.

> what do you think?  what ideas are missing from the branstorm above?  I'd
> love to hear from people with multilingual mailboxes about how we might
> be able to make notmuch work better for them.

With my limited understanding of this, stemming happens both at indexing
and searching. Basically at indexing, the term generator indexes both
the full and the stemmed version of words. I'm wondering if we could
look at Content-Language (and missing that, heuristics), and (if the
user so desires) use multiple term generators with different stemmers on
a per document basis. Or, use non-stemming indexing for unidentified or
unsupported languages. How far would that take us? Then, perhaps, we
could also perform language specific queries?

I don't know how feasible that is, or if it would require Xapian
changes.

BR,
Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: multilingual notmuch (and Content-Language)

2018-03-18 Thread David Bremner
Daniel Kahn Gillmor  writes:

> AIUI, xapian is pretty much committed to being a single-language
> indexer.  But i just wanted to point out that it's possible that we
> could be smarter about this in notmuch, and wanted to make a space for
> possible design discussion.
>

More precisely, it uses a single _stemmer_ when generating terms and
when parsing queries. Nothing says that these have to correspond to a
single human language. The stemmer is also configured at runtime, so it
could in principle be per database configurable. I mention the
possibility of a custom stemmer because that also seems like a natural
place to put things like unicode normalization and accent removal.

d


___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 1/1] Fix typos as found by codespell

2018-03-18 Thread Daniel Kahn Gillmor
On Sun 2018-03-18 03:30:25 +0100, Georg Faerber wrote:
> Signed-off-by: Georg Faerber 
> ---
>  NEWS   | 4 ++--
>  bindings/python/docs/source/filesystem.rst | 2 +-
>  contrib/go/src/notmuch/notmuch.go  | 2 +-
>  debian/changelog   | 4 ++--
>  emacs/notmuch-mua.el   | 2 +-
>  test/T190-multipart.sh | 2 +-
>  test/T410-argument-parsing.sh  | 2 +-
>  vim/README | 2 +-
>  8 files changed, 10 insertions(+), 10 deletions(-)

this patch looks good to me.

 --dkg
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


multilingual notmuch (and Content-Language)

2018-03-18 Thread Daniel Kahn Gillmor
https://tools.ietf.org/html/rfc3282 describes a Content-Language:
header.  https://tools.ietf.org/html/rfc8255 describes
a multipart/multilingual Content-Type.

notmuch currently uses xapian with a hard-coded English stemmer which
works great for me as a monolingual American, but limits the
applicability of notmuch to Anglophiles (people who speak English).
That makes me sad.

AIUI, xapian is pretty much committed to being a single-language
indexer.  But i just wanted to point out that it's possible that we
could be smarter about this in notmuch, and wanted to make a space for
possible design discussion.

a few concrete suggestions (intended as brainstorming, feedback welcome):

 * if we know our index expects english, and we have a message part that
   *is not* english (e.g. Content-Language: es), we could avoid indexing
   that part.

 * during indexing, we could add a property to each message when we
   discover a Content-Language header.  this would let you do something
   like "notmuch search property:lang=es" to find all messages
   explicitly tagged as spanish.

 * (pretty crazy) If we're willing to search in another language we
   could add an additional xapian database configured that language, and
   we could index identified parts in that language.

 * for text parts without a Content-Language: header, we could do some
   concrete heuristics to guess the language.  For example, choose the
   1000 most popular words for each language we might know about, and
   look for their presence in the text.  Choose the language that is
   most heavily represented, and store it in the index as a property.
   this could be combined with the suggestions above.

what do you think?  what ideas are missing from the branstorm above?  I'd
love to hear from people with multilingual mailboxes about how we might
be able to make notmuch work better for them.

Regards,

--dkg


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: Permissions of files created by notmuch

2018-03-18 Thread Daniel Kahn Gillmor
On Sun 2018-03-18 04:30:06 +0100, Georg Faerber wrote:

> I'm using notmuch 0.26-1+b2 out of Debian unstable.
> The files created inside .notmuch/xapian by notmuch are group and world
> readable.  Is this on purpose? This seems quite suboptimal, especially
> if one is using the recently introduced cleartext indexing feature..

is your mailbox itself world-readable?  What is your umask?

in general, i'd expect notmuch to follow umask like any other unix
tool.  if we wanted it to be more restrictive, maybe that's a separate
use case.

See also discussion at id:20180209041058.4037-1-...@fifthhorseman.net
around whether "notmuch insert" and other tools should produce
world-readable files by default.

curious to hear what you think is the right choice here.

   --dkg


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: emacs-notmuch: A Xapian exception occurred parsing query

2018-03-18 Thread David Bremner
Olly Betts  writes:

> On 2018-02-07, David Bremner wrote:
>> The underlying issue is that * is parsed (simplistically) by notmuch
>> before passing to Xapian, so only works if it is the entire query.
>>
>> For cases like you report, where the user has not entered '*', but
>> rather it is contained in some generated query string, we could fix the
>> problem by adding a prefix like "special:*".
>
> If you're generating the query string, you could presumably just
> generate « tag:flagged » for this case.

Yes, that should in principle be a Simple Matter of Programming
(TM). But this doesn't affect the underlying infelicity that

"*" is a valid notmuch query, while

"* and tag:foo" is not

>
> Though it's generally better not to try to generate a string to parse,
> but instead to parse any part(s) the user actually wrote and combine
> the resulting Xapian::Query objects with directly constructed objects
> for other filters, etc.
>

That's a bit tougher here, since the emacs interface is calling the
notmuch CLI with query strings.  
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: Crash with Python bindings

2018-03-18 Thread Floris Bruynooghe
Daniel Kahn Gillmor  writes:

> On Fri 2018-03-16 19:30:37 +0100, Floris Bruynooghe wrote:
>> If someone can hook pytest runs with various python versions into the
>> notmuch test suit I'd be very much obliged and probably have another go
>> at this as it's still an interesting problem and gives a nice way
>> forward.
>
> I don't really know what this request means -- so maybe that means that
> i'm not the right person for the task, which is fine.
>
> but it's also possible that the right person for the task *also* doesn't
> know what you're asking for, so if you could elaborate a bit further
> i think that would be super helpful :)

Fair enough :)
Here a somewhat more long-form version of this:

Before even attempting to refactor the existing bindings to use cffi as
a backend instead off ctypes and/or adding the changes needed to track
the lifetime of objects correctly I would like to be able to write full
unitttest-level tests for the bindings to be able to guarantee that no
user-level APIs are broken.  In my version of the bindings I did this
the traditional Python way: using pytest for writing unittest and using
tox to invoke the tests for the various supported versions of python.

One of the feedback items I got from the patch I sent last time was that
the project would be reluctant to adopt this and would like to avoid
virtualenv and pip with their behaviour of downloading things over the
network.  Instead wishing for it to use a system python which should
have the available tools already installed (i.e. pytest).  And all this
just integrated in the existing test suite.

So my last attempt at this looks like I made a test/T391-pytest.sh file
with the idea of running a subtest for each python version, with the
subtest being a ``pythonX.Y -m pytest bindings/python/tests/`` so it'd
run the entire test.  To be nice this also needs to be hooked up so that
the subtests get skipped when a python version is not available, or is
missing pytest itself.

So while trying to figure this out is where I got distracted last time
and started working more on other things.


Kind Regards,
Floris
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch