[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...

2018-05-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/406


---


[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...

2018-04-24 Thread afs
Github user afs commented on a diff in the pull request:

https://github.com/apache/jena/pull/406#discussion_r183788841
  
--- Diff: 
jena-text-es/src/main/java/org/apache/jena/query/text/es/TextIndexES.java ---
@@ -422,6 +422,27 @@ public EntityDefinition getDocDef() {
 }
 
 private String parse(String fieldName, String qs, String lang) {
+//Escape special characters if any in the query string
+qs = qs.replaceAll("\\:", ":")
+.replaceAll("\\+", "+")
+.replaceAll("\\-", "-")
+.replaceAll("\\=", "=")
+.replaceAll("\\&", "&")
+.replaceAll("\\|", "|")
+.replaceAll("\\>", ">")
+.replaceAll("\\<", "<")
+.replaceAll("\\!", "!")
+.replaceAll("\\(", "(")
+.replaceAll("\\)", ")")
+.replaceAll("\\{", "{")
+.replaceAll("\\}", "}")
+.replaceAll("\\]", "]")
+.replaceAll("\\[", "[")
+.replaceAll("\\^", "^")
+.replaceAll("\\~", "~")
+.replaceAll("\\?", "?");
+
--- End diff --

The [Lucene escape 
code|https://github.com/apache/lucene-solr/blob/master/lucene/queryparser/src/java/org/apache/lucene/queryparser/classic/QueryParserBase.java#L971].

It includes `\\` and does a simple pass over the string.

The thing to watch for with `replaceAll` is that it creates a regex every 
call which can mount up even for these fixed string regexps.


---


[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...

2018-04-24 Thread anujgandharv
Github user anujgandharv commented on a diff in the pull request:

https://github.com/apache/jena/pull/406#discussion_r183642709
  
--- Diff: 
jena-text-es/src/main/java/org/apache/jena/query/text/es/TextIndexES.java ---
@@ -422,6 +422,27 @@ public EntityDefinition getDocDef() {
 }
 
 private String parse(String fieldName, String qs, String lang) {
+//Escape special characters if any in the query string
+qs = qs.replaceAll("\\:", ":")
+.replaceAll("\\+", "+")
+.replaceAll("\\-", "-")
+.replaceAll("\\=", "=")
+.replaceAll("\\&", "&")
+.replaceAll("\\|", "|")
+.replaceAll("\\>", ">")
+.replaceAll("\\<", "<")
+.replaceAll("\\!", "!")
+.replaceAll("\\(", "(")
+.replaceAll("\\)", ")")
+.replaceAll("\\{", "{")
+.replaceAll("\\}", "}")
+.replaceAll("\\]", "]")
+.replaceAll("\\[", "[")
+.replaceAll("\\^", "^")
+.replaceAll("\\~", "~")
+.replaceAll("\\?", "?");
+
--- End diff --

So, what i understand is that backslashes need to be escaped even before 
they reach the Jena ES query method, just like double quotes. Therefore i 
haven't added the check for it. I can add a unit test depicting the escaping of 
backslash in the query string itself. Let me know if that would help.


---


[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...

2018-04-24 Thread osma
Github user osma commented on a diff in the pull request:

https://github.com/apache/jena/pull/406#discussion_r183630591
  
--- Diff: 
jena-text-es/src/main/java/org/apache/jena/query/text/es/TextIndexES.java ---
@@ -422,6 +422,27 @@ public EntityDefinition getDocDef() {
 }
 
 private String parse(String fieldName, String qs, String lang) {
+//Escape special characters if any in the query string
+qs = qs.replaceAll("\\:", ":")
+.replaceAll("\\+", "+")
+.replaceAll("\\-", "-")
+.replaceAll("\\=", "=")
+.replaceAll("\\&", "&")
+.replaceAll("\\|", "|")
+.replaceAll("\\>", ">")
+.replaceAll("\\<", "<")
+.replaceAll("\\!", "!")
+.replaceAll("\\(", "(")
+.replaceAll("\\)", ")")
+.replaceAll("\\{", "{")
+.replaceAll("\\}", "}")
+.replaceAll("\\]", "]")
+.replaceAll("\\[", "[")
+.replaceAll("\\^", "^")
+.replaceAll("\\~", "~")
+.replaceAll("\\?", "?");
+
--- End diff --

What about escaping preexisting backslashes in the query string? I suppose 
that would have to be done first.


---


[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...

2018-04-24 Thread osma
Github user osma commented on a diff in the pull request:

https://github.com/apache/jena/pull/406#discussion_r183630332
  
--- Diff: jena-text-es/src/main/java/examples/JenaESTextExample.java ---
@@ -73,7 +73,8 @@ public static void queryDataWithoutProperty(Dataset 
dataset)
 String qs = StrUtils.strjoinNL
 ( "SELECT * "
 //, " { ?s text:query (rdfs:comment 'this' 
'lang:en') ;"
-, " { ?s text:query ('this' 'lang:en*') ;"
+//, " { ?s text:query ('this' 'lang:en*') ;"
--- End diff --

Does the example code have to be changed?


---


[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...

2018-04-23 Thread anujgandharv
Github user anujgandharv commented on a diff in the pull request:

https://github.com/apache/jena/pull/406#discussion_r183512768
  
--- Diff: 
jena-text-es/src/main/java/org/apache/jena/query/text/es/TextIndexES.java ---
@@ -422,6 +422,27 @@ public EntityDefinition getDocDef() {
 }
 
 private String parse(String fieldName, String qs, String lang) {
+//Escape special characters if any in the query string
+qs = qs.replaceAll("\\:", ":")
--- End diff --

Awesome. I will wait it to be merged in master.


---


[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...

2018-04-23 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/406#discussion_r183508856
  
--- Diff: 
jena-text-es/src/main/java/org/apache/jena/query/text/es/TextIndexES.java ---
@@ -422,6 +422,27 @@ public EntityDefinition getDocDef() {
 }
 
 private String parse(String fieldName, String qs, String lang) {
+//Escape special characters if any in the query string
+qs = qs.replaceAll("\\:", ":")
--- End diff --

No need to change anything-- if you are confident from the impl side, let's 
go forward!


---


[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...

2018-04-23 Thread anujgandharv
Github user anujgandharv commented on a diff in the pull request:

https://github.com/apache/jena/pull/406#discussion_r183507712
  
--- Diff: 
jena-text-es/src/main/java/org/apache/jena/query/text/es/TextIndexES.java ---
@@ -422,6 +422,27 @@ public EntityDefinition getDocDef() {
 }
 
 private String parse(String fieldName, String qs, String lang) {
+//Escape special characters if any in the query string
+qs = qs.replaceAll("\\:", ":")
--- End diff --

So this is indeed a bit of a style question. But keeping that aside, under 
the hood, ```str.replaceAll()``` is equivalent to 
```Pattern.compile(regex).matcher(str).replaceAll(repl)```
I can rewrite the ```str.replaceAll``` like above, but I genuinely do not 
see any major gains. You get to make the final call. :) Changing it is quite 
trivial from my side. Its more the logistics that I would like to avoid, if 
possible.


---


[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...

2018-04-23 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/406#discussion_r183494640
  
--- Diff: 
jena-text-es/src/main/java/org/apache/jena/query/text/es/TextIndexES.java ---
@@ -422,6 +422,27 @@ public EntityDefinition getDocDef() {
 }
 
 private String parse(String fieldName, String qs, String lang) {
+//Escape special characters if any in the query string
+qs = qs.replaceAll("\\:", ":")
--- End diff --

This may be a question of style, but might this read a bit better as a 
constant `Pattern`? (I don't know enough to think very usefully about the 
performance implications, but I would think a `Pattern` would be at _least_ as 
performant.)


---