[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...
Github user asfgit closed the pull request at: https://github.com/apache/jena/pull/406 ---
[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...
Github user afs commented on a diff in the pull request: https://github.com/apache/jena/pull/406#discussion_r183788841 --- Diff: jena-text-es/src/main/java/org/apache/jena/query/text/es/TextIndexES.java --- @@ -422,6 +422,27 @@ public EntityDefinition getDocDef() { } private String parse(String fieldName, String qs, String lang) { +//Escape special characters if any in the query string +qs = qs.replaceAll("\\:", ":") +.replaceAll("\\+", "+") +.replaceAll("\\-", "-") +.replaceAll("\\=", "=") +.replaceAll("\\&", "&") +.replaceAll("\\|", "|") +.replaceAll("\\>", ">") +.replaceAll("\\<", "<") +.replaceAll("\\!", "!") +.replaceAll("\\(", "(") +.replaceAll("\\)", ")") +.replaceAll("\\{", "{") +.replaceAll("\\}", "}") +.replaceAll("\\]", "]") +.replaceAll("\\[", "[") +.replaceAll("\\^", "^") +.replaceAll("\\~", "~") +.replaceAll("\\?", "?"); + --- End diff -- The [Lucene escape code|https://github.com/apache/lucene-solr/blob/master/lucene/queryparser/src/java/org/apache/lucene/queryparser/classic/QueryParserBase.java#L971]. It includes `\\` and does a simple pass over the string. The thing to watch for with `replaceAll` is that it creates a regex every call which can mount up even for these fixed string regexps. ---
[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...
Github user anujgandharv commented on a diff in the pull request: https://github.com/apache/jena/pull/406#discussion_r183642709 --- Diff: jena-text-es/src/main/java/org/apache/jena/query/text/es/TextIndexES.java --- @@ -422,6 +422,27 @@ public EntityDefinition getDocDef() { } private String parse(String fieldName, String qs, String lang) { +//Escape special characters if any in the query string +qs = qs.replaceAll("\\:", ":") +.replaceAll("\\+", "+") +.replaceAll("\\-", "-") +.replaceAll("\\=", "=") +.replaceAll("\\&", "&") +.replaceAll("\\|", "|") +.replaceAll("\\>", ">") +.replaceAll("\\<", "<") +.replaceAll("\\!", "!") +.replaceAll("\\(", "(") +.replaceAll("\\)", ")") +.replaceAll("\\{", "{") +.replaceAll("\\}", "}") +.replaceAll("\\]", "]") +.replaceAll("\\[", "[") +.replaceAll("\\^", "^") +.replaceAll("\\~", "~") +.replaceAll("\\?", "?"); + --- End diff -- So, what i understand is that backslashes need to be escaped even before they reach the Jena ES query method, just like double quotes. Therefore i haven't added the check for it. I can add a unit test depicting the escaping of backslash in the query string itself. Let me know if that would help. ---
[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...
Github user osma commented on a diff in the pull request: https://github.com/apache/jena/pull/406#discussion_r183630591 --- Diff: jena-text-es/src/main/java/org/apache/jena/query/text/es/TextIndexES.java --- @@ -422,6 +422,27 @@ public EntityDefinition getDocDef() { } private String parse(String fieldName, String qs, String lang) { +//Escape special characters if any in the query string +qs = qs.replaceAll("\\:", ":") +.replaceAll("\\+", "+") +.replaceAll("\\-", "-") +.replaceAll("\\=", "=") +.replaceAll("\\&", "&") +.replaceAll("\\|", "|") +.replaceAll("\\>", ">") +.replaceAll("\\<", "<") +.replaceAll("\\!", "!") +.replaceAll("\\(", "(") +.replaceAll("\\)", ")") +.replaceAll("\\{", "{") +.replaceAll("\\}", "}") +.replaceAll("\\]", "]") +.replaceAll("\\[", "[") +.replaceAll("\\^", "^") +.replaceAll("\\~", "~") +.replaceAll("\\?", "?"); + --- End diff -- What about escaping preexisting backslashes in the query string? I suppose that would have to be done first. ---
[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...
Github user osma commented on a diff in the pull request: https://github.com/apache/jena/pull/406#discussion_r183630332 --- Diff: jena-text-es/src/main/java/examples/JenaESTextExample.java --- @@ -73,7 +73,8 @@ public static void queryDataWithoutProperty(Dataset dataset) String qs = StrUtils.strjoinNL ( "SELECT * " //, " { ?s text:query (rdfs:comment 'this' 'lang:en') ;" -, " { ?s text:query ('this' 'lang:en*') ;" +//, " { ?s text:query ('this' 'lang:en*') ;" --- End diff -- Does the example code have to be changed? ---
[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...
Github user anujgandharv commented on a diff in the pull request: https://github.com/apache/jena/pull/406#discussion_r183512768 --- Diff: jena-text-es/src/main/java/org/apache/jena/query/text/es/TextIndexES.java --- @@ -422,6 +422,27 @@ public EntityDefinition getDocDef() { } private String parse(String fieldName, String qs, String lang) { +//Escape special characters if any in the query string +qs = qs.replaceAll("\\:", ":") --- End diff -- Awesome. I will wait it to be merged in master. ---
[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/406#discussion_r183508856 --- Diff: jena-text-es/src/main/java/org/apache/jena/query/text/es/TextIndexES.java --- @@ -422,6 +422,27 @@ public EntityDefinition getDocDef() { } private String parse(String fieldName, String qs, String lang) { +//Escape special characters if any in the query string +qs = qs.replaceAll("\\:", ":") --- End diff -- No need to change anything-- if you are confident from the impl side, let's go forward! ---
[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...
Github user anujgandharv commented on a diff in the pull request: https://github.com/apache/jena/pull/406#discussion_r183507712 --- Diff: jena-text-es/src/main/java/org/apache/jena/query/text/es/TextIndexES.java --- @@ -422,6 +422,27 @@ public EntityDefinition getDocDef() { } private String parse(String fieldName, String qs, String lang) { +//Escape special characters if any in the query string +qs = qs.replaceAll("\\:", ":") --- End diff -- So this is indeed a bit of a style question. But keeping that aside, under the hood, ```str.replaceAll()``` is equivalent to ```Pattern.compile(regex).matcher(str).replaceAll(repl)``` I can rewrite the ```str.replaceAll``` like above, but I genuinely do not see any major gains. You get to make the final call. :) Changing it is quite trivial from my side. Its more the logistics that I would like to avoid, if possible. ---
[GitHub] jena pull request #406: JENA-1532 | Added support for escaping special chara...
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/406#discussion_r183494640 --- Diff: jena-text-es/src/main/java/org/apache/jena/query/text/es/TextIndexES.java --- @@ -422,6 +422,27 @@ public EntityDefinition getDocDef() { } private String parse(String fieldName, String qs, String lang) { +//Escape special characters if any in the query string +qs = qs.replaceAll("\\:", ":") --- End diff -- This may be a question of style, but might this read a bit better as a constant `Pattern`? (I don't know enough to think very usefully about the performance implications, but I would think a `Pattern` would be at _least_ as performant.) ---