`notmuch-escape-boolean-term': Broken for non-ascii characters
"Austin T. Clements" writes: > Quoting Moritz Ulrich : >> Hello, >> >> I recently adopted notmuch as my primary way to read mail, so thank you >> for this great tool! >> >> Unfortunately, I ran into a problem of the Emacs side of the project >> when used in a non-ascii environment: >> >> Having a tag named 'uni-k?ln', the tag:-completion doesn't work. >> >> This is caused by `notmuch-escape-boolean-term' errornously escaping the >> above string: >> >> (notmuch-escape-boolean-term "uni-k?ln") => "\"uni-k?ln\"" >> >> This is caused by `string-match' with the following errornously matching >> my tag: >> >> (string-match "[^!#-'*-~]" "uni-k?ln") => 5 >> (string-match "[^!#-'*-~]" "uni-koln") => nil >> >> I'm not exactly sure how to tackle this - the Regexp was crafted to match >> (, ), " if I understand it correct. A simple way would be just adding >> more characters as a sort-of whitelist. A nicer solution would be >> converting it from [^...] to [...] to explicitly mark letters that needs >> to be escaped. > > notmuch-escape-boolean-term used to use a blacklist, but we switched > to a whitelist because Xapian's own parser has changed over the years > in its handling of non-ASCII characters and invalidated our blacklist. > Ultimately it seemed much safer to go with a whitelist. Quoting > "uni-k?ln" isn't erroneous, it's just conservative. > > Could you explain in more detail what's broken? I tried adding the > tag uni-k?ln to a message in Emacs, then hitting "s" to start a search > then "tag:" and that tag (surrounded by quotes) was one of the > completion options. Upon completing to that tag, the search worked > fine. > > Are you objecting to the unnecessary (but legal) quotes in the > completion? We might be able to include Unicode word characters in > the quoting whitelist, though that seems like a spot fix (probably a > fairly broad one, so maybe that's fine) and might be tricky because of > Emacs' somewhat weird Unicode regexp support (using [[:alpha:]] might > Just Work, but we'd have to be careful of the active syntax table). > Or tab completion could recognize that, say, tag:uni doesn't require > quoting, but still expand it to tag:"uni-k?ln". Thanks for explaining the reason for the whitelist-approach. Knowing this is quite helpful. I can't really explain why, but I just didn't notice tag:"uni-k?ln" in the tag-completion - I think my expectations for finding it as tag:uni-k?ln must have blinded me. While it isn't errornous, it's higly unintuitive to quote tags like this. I can understand that a much more permissive whitelist could cause other problems which are harder to track down, so maybe it's possible to make the behavior configurable (e.g. by using a `defvar' for the regex). -- Moritz Ulrich -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 818 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20140812/bec926a6/attachment.pgp>
Re: `notmuch-escape-boolean-term': Broken for non-ascii characters
"Austin T. Clements" writes: > Quoting Moritz Ulrich : >> Hello, >> >> I recently adopted notmuch as my primary way to read mail, so thank you >> for this great tool! >> >> Unfortunately, I ran into a problem of the Emacs side of the project >> when used in a non-ascii environment: >> >> Having a tag named 'uni-köln', the tag:-completion doesn't work. >> >> This is caused by `notmuch-escape-boolean-term' errornously escaping the >> above string: >> >> (notmuch-escape-boolean-term "uni-köln") => "\"uni-köln\"" >> >> This is caused by `string-match' with the following errornously matching >> my tag: >> >> (string-match "[^!#-'*-~]" "uni-köln") => 5 >> (string-match "[^!#-'*-~]" "uni-koln") => nil >> >> I'm not exactly sure how to tackle this - the Regexp was crafted to match >> (, ), " if I understand it correct. A simple way would be just adding >> more characters as a sort-of whitelist. A nicer solution would be >> converting it from [^...] to [...] to explicitly mark letters that needs >> to be escaped. > > notmuch-escape-boolean-term used to use a blacklist, but we switched > to a whitelist because Xapian's own parser has changed over the years > in its handling of non-ASCII characters and invalidated our blacklist. > Ultimately it seemed much safer to go with a whitelist. Quoting > "uni-köln" isn't erroneous, it's just conservative. > > Could you explain in more detail what's broken? I tried adding the > tag uni-köln to a message in Emacs, then hitting "s" to start a search > then "tag:" and that tag (surrounded by quotes) was one of the > completion options. Upon completing to that tag, the search worked > fine. > > Are you objecting to the unnecessary (but legal) quotes in the > completion? We might be able to include Unicode word characters in > the quoting whitelist, though that seems like a spot fix (probably a > fairly broad one, so maybe that's fine) and might be tricky because of > Emacs' somewhat weird Unicode regexp support (using [[:alpha:]] might > Just Work, but we'd have to be careful of the active syntax table). > Or tab completion could recognize that, say, tag:uni doesn't require > quoting, but still expand it to tag:"uni-köln". Thanks for explaining the reason for the whitelist-approach. Knowing this is quite helpful. I can't really explain why, but I just didn't notice tag:"uni-köln" in the tag-completion - I think my expectations for finding it as tag:uni-köln must have blinded me. While it isn't errornous, it's higly unintuitive to quote tags like this. I can understand that a much more permissive whitelist could cause other problems which are harder to track down, so maybe it's possible to make the behavior configurable (e.g. by using a `defvar' for the regex). -- Moritz Ulrich pgpkMTgwNLZjA.pgp Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
`notmuch-escape-boolean-term': Broken for non-ascii characters
Quoting Moritz Ulrich : > Hello, > > I recently adopted notmuch as my primary way to read mail, so thank you > for this great tool! > > Unfortunately, I ran into a problem of the Emacs side of the project > when used in a non-ascii environment: > > Having a tag named 'uni-k?ln', the tag:-completion doesn't work. > > This is caused by `notmuch-escape-boolean-term' errornously escaping the > above string: > > (notmuch-escape-boolean-term "uni-k?ln") => "\"uni-k?ln\"" > > This is caused by `string-match' with the following errornously matching > my tag: > > (string-match "[^!#-'*-~]" "uni-k?ln") => 5 > (string-match "[^!#-'*-~]" "uni-koln") => nil > > I'm not exactly sure how to tackle this - the Regexp was crafted to match > (, ), " if I understand it correct. A simple way would be just adding > more characters as a sort-of whitelist. A nicer solution would be > converting it from [^...] to [...] to explicitly mark letters that needs > to be escaped. notmuch-escape-boolean-term used to use a blacklist, but we switched to a whitelist because Xapian's own parser has changed over the years in its handling of non-ASCII characters and invalidated our blacklist. Ultimately it seemed much safer to go with a whitelist. Quoting "uni-k?ln" isn't erroneous, it's just conservative. Could you explain in more detail what's broken? I tried adding the tag uni-k?ln to a message in Emacs, then hitting "s" to start a search then "tag:" and that tag (surrounded by quotes) was one of the completion options. Upon completing to that tag, the search worked fine. Are you objecting to the unnecessary (but legal) quotes in the completion? We might be able to include Unicode word characters in the quoting whitelist, though that seems like a spot fix (probably a fairly broad one, so maybe that's fine) and might be tricky because of Emacs' somewhat weird Unicode regexp support (using [[:alpha:]] might Just Work, but we'd have to be careful of the active syntax table). Or tab completion could recognize that, say, tag:uni doesn't require quoting, but still expand it to tag:"uni-k?ln".
Re: `notmuch-escape-boolean-term': Broken for non-ascii characters
Quoting Moritz Ulrich : Hello, I recently adopted notmuch as my primary way to read mail, so thank you for this great tool! Unfortunately, I ran into a problem of the Emacs side of the project when used in a non-ascii environment: Having a tag named 'uni-köln', the tag:-completion doesn't work. This is caused by `notmuch-escape-boolean-term' errornously escaping the above string: (notmuch-escape-boolean-term "uni-köln") => "\"uni-köln\"" This is caused by `string-match' with the following errornously matching my tag: (string-match "[^!#-'*-~]" "uni-köln") => 5 (string-match "[^!#-'*-~]" "uni-koln") => nil I'm not exactly sure how to tackle this - the Regexp was crafted to match (, ), " if I understand it correct. A simple way would be just adding more characters as a sort-of whitelist. A nicer solution would be converting it from [^...] to [...] to explicitly mark letters that needs to be escaped. notmuch-escape-boolean-term used to use a blacklist, but we switched to a whitelist because Xapian's own parser has changed over the years in its handling of non-ASCII characters and invalidated our blacklist. Ultimately it seemed much safer to go with a whitelist. Quoting "uni-köln" isn't erroneous, it's just conservative. Could you explain in more detail what's broken? I tried adding the tag uni-köln to a message in Emacs, then hitting "s" to start a search then "tag:" and that tag (surrounded by quotes) was one of the completion options. Upon completing to that tag, the search worked fine. Are you objecting to the unnecessary (but legal) quotes in the completion? We might be able to include Unicode word characters in the quoting whitelist, though that seems like a spot fix (probably a fairly broad one, so maybe that's fine) and might be tricky because of Emacs' somewhat weird Unicode regexp support (using [[:alpha:]] might Just Work, but we'd have to be careful of the active syntax table). Or tab completion could recognize that, say, tag:uni doesn't require quoting, but still expand it to tag:"uni-köln". ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
`notmuch-escape-boolean-term': Broken for non-ascii characters
Hello, I recently adopted notmuch as my primary way to read mail, so thank you for this great tool! Unfortunately, I ran into a problem of the Emacs side of the project when used in a non-ascii environment: Having a tag named 'uni-köln', the tag:-completion doesn't work. This is caused by `notmuch-escape-boolean-term' errornously escaping the above string: (notmuch-escape-boolean-term "uni-köln") => "\"uni-köln\"" This is caused by `string-match' with the following errornously matching my tag: (string-match "[^!#-'*-~]" "uni-köln") => 5 (string-match "[^!#-'*-~]" "uni-koln") => nil I'm not exactly sure how to tackle this - the Regexp was crafted to match (, ), " if I understand it correct. A simple way would be just adding more characters as a sort-of whitelist. A nicer solution would be converting it from [^...] to [...] to explicitly mark letters that needs to be escaped. Cheers, Moritz Ulrich pgpzkzpJi1xQ8.pgp Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: Matching on any header line
David Bremner [Mon, Aug 11, 2014 at 01:37:50PM -0300]: > Nico Schottelius writes: > > > I have the problem that often To/Cc do not reveal the real destination, > > so I would like to match on X-Original-To: or Delivered-To: > > header lines. > > > > So I was wondering, if there is generic support to match on something > > like "header:x-original-to:t...@example.org"? > > Such support does not currently exist in notmuch. Too bad - if you are in general open for it, I will add it to my "to hack" list. Cheers, Nico -- New PGP key: 659B 0D91 E86E 7E24 FD15 69D0 C729 21A1 293F 2D24 ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch