[GitHub] [lucene-solr] markharwood commented on pull request #1708: LUCENE-9445 Add support for case insensitive regex searches in QueryParser

2020-08-20 Thread GitBox


markharwood commented on pull request #1708:
URL: https://github.com/apache/lucene-solr/pull/1708#issuecomment-677691691


   @jimczi The TL/DR is I think it's going to be too hard to implement the 
stricter parsing logic.
   
   I spoke with @romseygeek and we couldn't see a neat way that the string 
after the closing `/` could be processed. The parser doesn't allow us to write 
a handler that behaves differently depending on a flag:
    Eager option  - pass everything immediately after closing `/` to the 
regexp constructor logic.
   1) When in strict mode , a simple non-i string is easy to detect and error 
on.
   2) When not in strict mode the non-i string has to then be parsed as a 
regular clause (along with any boosts). It's not possible for the regex 
query-building logic to invoke the parser to do this.
   
    Lazy option - pass only any trailing `i` to the regexp constructor logic
   This simplifies the regex construction logic but pushes the problem 
elsewhere - in strict mode, the next clause the parser builds would have to 
double-check it wasn't immediately preceded by a regex with no whitespace to 
separate them and throw an error. The parsing framework does not provide such a 
context that would allow this logic.
   
   ### Alternative option
   We did consider forgetting about `/Foo/i` syntax and opting for the inline 
syntax commonly supported by regex parsers e.g. `/(?i)Foo/` but this is less 
well known and more cryptic.
   
   ### Summary
   We felt that the existing lax implementation where `/Foo/iphone` is 
interpreted as `/[Ff][Oo][Oo]/ OR phone` is not so awful that it warrants the 
added complexity needed to make it work as we'd ideally want.
   If there are any smart ideas we're open to them but it looks like a very 
messy problem.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] markharwood commented on pull request #1708: LUCENE-9445 Add support for case insensitive regex searches in QueryParser

2020-08-18 Thread GitBox


markharwood commented on pull request #1708:
URL: https://github.com/apache/lucene-solr/pull/1708#issuecomment-675420267


   OK. @romseygeek suggested the BWC flag is called "allow_modifiers" and, if 
false, legacy behaviour is used ie there would be no errors for characters 
after trailing `/` and `/regex/i` is still interpreted as `/regex/` and term 
`i`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] markharwood commented on pull request #1708: LUCENE-9445 Add support for case insensitive regex searches in QueryParser

2020-08-18 Thread GitBox


markharwood commented on pull request #1708:
URL: https://github.com/apache/lucene-solr/pull/1708#issuecomment-675386274


   >I wonder if the parsing should be more strict and only matches if there is 
a separator or ends after the i ?
   
   @jimczi What is the behaviour for a non-match?
   Should `/regex/iterm` throw an error or be interpreted as `/regex/` and 
`iterm`?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] markharwood commented on pull request #1708: LUCENE-9445 Add support for case insensitive regex searches in QueryParser

2020-08-13 Thread GitBox


markharwood commented on pull request #1708:
URL: https://github.com/apache/lucene-solr/pull/1708#issuecomment-673437428


   Progress update  - I'm struggling a bit with how to make the parser stricter 
i.e. ensuring there's a space between `/Foo/i` and the next search term. I'd 
also need to allow for `( A OR /Foo/i)` with the closing bracket instead of a 
space.
   I'm not yet clear in JavaCC how to write that rule without also consuming 
the closing bracket. Some BWC issues to work through here too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] markharwood commented on pull request #1708: LUCENE-9445 Add support for case insensitive regex searches in QueryParser

2020-08-03 Thread GitBox


markharwood commented on pull request #1708:
URL: https://github.com/apache/lucene-solr/pull/1708#issuecomment-668124174


   >I wonder if the parsing should be more strict and only matches if there is 
a separator or ends after the i ?
   
   You're right that this means a breaking change. 
   I just tested range queries ie `num:[1 TO 5]i` and that tolerates no 
whitespace between the closing `]` and a search for the letter `i` so it's 
currently legal syntax (although weird and probably rare).
   
   I hadn't considered adding a flag for 8.x. If we do I'd prefer to see it 
support the new behaviour by default -  the rationale being that it is better 
to give an escape hatch to the few admins concerned about BWC for an edge case  
than continue the legacy of silent failures for potentially many regex 
searchers assuming `/Foo/i` was supported syntax.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org