`notmuch-escape-boolean-term': Broken for non-ascii characters

2014-08-13 Thread Moritz Ulrich
"Austin T. Clements"  writes:

> Quoting Moritz Ulrich :
>> Hello,
>>
>> I recently adopted notmuch as my primary way to read mail, so thank you
>> for this great tool!
>>
>> Unfortunately, I ran into a problem of the Emacs side of the project
>> when used in a non-ascii environment:
>>
>> Having a tag named 'uni-k?ln', the tag:-completion doesn't work.
>>
>> This is caused by `notmuch-escape-boolean-term' errornously escaping the
>> above string:
>>
>> (notmuch-escape-boolean-term "uni-k?ln") => "\"uni-k?ln\""
>>
>> This is caused by `string-match' with the following errornously matching
>> my tag:
>>
>> (string-match "[^!#-'*-~]" "uni-k?ln") => 5
>> (string-match "[^!#-'*-~]" "uni-koln") => nil
>>
>> I'm not exactly sure how to tackle this - the Regexp was crafted to match
>> (, ), " if I understand it correct. A simple way would be just adding
>> more characters as a sort-of whitelist. A nicer solution would be
>> converting it from [^...] to [...] to explicitly mark letters that needs
>> to be escaped.
>
> notmuch-escape-boolean-term used to use a blacklist, but we switched
> to a whitelist because Xapian's own parser has changed over the years
> in its handling of non-ASCII characters and invalidated our blacklist.
> Ultimately it seemed much safer to go with a whitelist.  Quoting
> "uni-k?ln" isn't erroneous, it's just conservative.
>
> Could you explain in more detail what's broken?  I tried adding the
> tag uni-k?ln to a message in Emacs, then hitting "s" to start a search
> then "tag:" and that tag (surrounded by quotes) was one of the
> completion options.  Upon completing to that tag, the search worked
> fine.
>
> Are you objecting to the unnecessary (but legal) quotes in the
> completion?  We might be able to include Unicode word characters in
> the quoting whitelist, though that seems like a spot fix (probably a
> fairly broad one, so maybe that's fine) and might be tricky because of
> Emacs' somewhat weird Unicode regexp support (using [[:alpha:]] might
> Just Work, but we'd have to be careful of the active syntax table).
> Or tab completion could recognize that, say, tag:uni doesn't require
> quoting, but still expand it to tag:"uni-k?ln".

Thanks for explaining the reason for the whitelist-approach. Knowing
this is quite helpful.

I can't really explain why, but I just didn't notice tag:"uni-k?ln" in
the tag-completion - I think my expectations for finding it as
tag:uni-k?ln must have blinded me.

While it isn't errornous, it's higly unintuitive to quote tags like
this. I can understand that a much more permissive whitelist could cause
other problems which are harder to track down, so maybe it's possible to
make the behavior configurable (e.g. by using a `defvar' for the regex).

-- 
Moritz Ulrich
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20140812/bec926a6/attachment.pgp>


`notmuch-escape-boolean-term': Broken for non-ascii characters

2014-08-12 Thread Moritz Ulrich

Hello,

I recently adopted notmuch as my primary way to read mail, so thank you
for this great tool!

Unfortunately, I ran into a problem of the Emacs side of the project
when used in a non-ascii environment:

Having a tag named 'uni-köln', the tag:-completion doesn't work.

This is caused by `notmuch-escape-boolean-term' errornously escaping the
above string:

(notmuch-escape-boolean-term uni-köln) = \uni-köln\

This is caused by `string-match' with the following errornously matching
my tag:

(string-match [^!#-'*-~] uni-köln) = 5
(string-match [^!#-'*-~] uni-koln) = nil

I'm not exactly sure how to tackle this - the Regexp was crafted to match
(, ),  if I understand it correct. A simple way would be just adding
more characters as a sort-of whitelist. A nicer solution would be
converting it from [^...] to [...] to explicitly mark letters that needs
to be escaped.

Cheers,
Moritz Ulrich


pgpzkzpJi1xQ8.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: `notmuch-escape-boolean-term': Broken for non-ascii characters

2014-08-12 Thread Moritz Ulrich
Austin T. Clements acleme...@csail.mit.edu writes:

 Quoting Moritz Ulrich mor...@tarn-vedra.de:
 Hello,

 I recently adopted notmuch as my primary way to read mail, so thank you
 for this great tool!

 Unfortunately, I ran into a problem of the Emacs side of the project
 when used in a non-ascii environment:

 Having a tag named 'uni-köln', the tag:-completion doesn't work.

 This is caused by `notmuch-escape-boolean-term' errornously escaping the
 above string:

 (notmuch-escape-boolean-term uni-köln) = \uni-köln\

 This is caused by `string-match' with the following errornously matching
 my tag:

 (string-match [^!#-'*-~] uni-köln) = 5
 (string-match [^!#-'*-~] uni-koln) = nil

 I'm not exactly sure how to tackle this - the Regexp was crafted to match
 (, ),  if I understand it correct. A simple way would be just adding
 more characters as a sort-of whitelist. A nicer solution would be
 converting it from [^...] to [...] to explicitly mark letters that needs
 to be escaped.

 notmuch-escape-boolean-term used to use a blacklist, but we switched
 to a whitelist because Xapian's own parser has changed over the years
 in its handling of non-ASCII characters and invalidated our blacklist.
 Ultimately it seemed much safer to go with a whitelist.  Quoting
 uni-köln isn't erroneous, it's just conservative.

 Could you explain in more detail what's broken?  I tried adding the
 tag uni-köln to a message in Emacs, then hitting s to start a search
 then tag:TAB and that tag (surrounded by quotes) was one of the
 completion options.  Upon completing to that tag, the search worked
 fine.

 Are you objecting to the unnecessary (but legal) quotes in the
 completion?  We might be able to include Unicode word characters in
 the quoting whitelist, though that seems like a spot fix (probably a
 fairly broad one, so maybe that's fine) and might be tricky because of
 Emacs' somewhat weird Unicode regexp support (using [[:alpha:]] might
 Just Work, but we'd have to be careful of the active syntax table).
 Or tab completion could recognize that, say, tag:uni doesn't require
 quoting, but still expand it to tag:uni-köln.

Thanks for explaining the reason for the whitelist-approach. Knowing
this is quite helpful.

I can't really explain why, but I just didn't notice tag:uni-köln in
the tag-completion - I think my expectations for finding it as
tag:uni-köln must have blinded me.

While it isn't errornous, it's higly unintuitive to quote tags like
this. I can understand that a much more permissive whitelist could cause
other problems which are harder to track down, so maybe it's possible to
make the behavior configurable (e.g. by using a `defvar' for the regex).

-- 
Moritz Ulrich


pgpkMTgwNLZjA.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


`notmuch-escape-boolean-term': Broken for non-ascii characters

2014-08-11 Thread Moritz Ulrich

Hello,

I recently adopted notmuch as my primary way to read mail, so thank you
for this great tool!

Unfortunately, I ran into a problem of the Emacs side of the project
when used in a non-ascii environment:

Having a tag named 'uni-k?ln', the tag:-completion doesn't work.

This is caused by `notmuch-escape-boolean-term' errornously escaping the
above string:

(notmuch-escape-boolean-term "uni-k?ln") => "\"uni-k?ln\""

This is caused by `string-match' with the following errornously matching
my tag:

(string-match "[^!#-'*-~]" "uni-k?ln") => 5
(string-match "[^!#-'*-~]" "uni-koln") => nil

I'm not exactly sure how to tackle this - the Regexp was crafted to match
(, ), " if I understand it correct. A simple way would be just adding
more characters as a sort-of whitelist. A nicer solution would be
converting it from [^...] to [...] to explicitly mark letters that needs
to be escaped.

Cheers,
Moritz Ulrich
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20140811/5be67b1f/attachment.pgp>