Hi Jena Users. We've been experiencing some peculiar behaviour with Jena/Fuseki and Lucene - particularly, but not entirely, around special characters.
We are currently running Fuseki 2.3.0, which seems to include Lucene 4.9.1, as far as we can tell. Using the query: PREFIX text: <http://jena.apache.org/text#> SELECT ?ent ?score { (?ent ?score) text:query (<TEXT> 'lang:en') } ...and different values of <TEXT>, the following happens 1) <TEXT> = '' Get server error: Cannot parse '() AND lang:en'" 2) <TEXT> = '*' - 26 results 3) <TEXT> = '\\*' - 26 results 4) <TEXT> = '\\?' - 26 results 5) <TEXT> = 'will' - 26 results ("will" is one of the words which is ignored by lucene, see e.g. https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.9.1/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/StopAnalyzer.java#L51 6) <TEXT> = '(?)' - 3 results labels/comments with single character words in them? 7) <TEXT> = '(\\?)' - 26 results 8) <TEXT> = '\\(\\?\\)' - 26 results It looks to us as if: Since fuseki turns "<TEXT>" into "(<TEXT>) AND lang:en", it would appear that empty matches for TEXT (grouped with braces) result in ALL entries being matched. Problem: Unless know complete list of ignored words & characters that lucene then goes on to turn into an empty match, it is impossible to stop fuseki returning ALL results with certain queries! Thanks in advance for any thoughts and help Mark -- Technology Lead, Iotic Labs +44 7973 674404 [email protected] https://www.iotic-labs.com
signature.asc
Description: OpenPGP digital signature
