Re: JenaText: support for explicit field names in text queries

Chris Tomlinson Sun, 01 Sep 2019 13:13:05 -0700

Hi again Brian,

I looked a bit more and it’s not clear how to “fix” the issue after all. The 
change I suggested to TextIndexLucene uncovers a basic issue.


When using a query such as:

    (?s ?score ?lit) text:query ( “some query string” 3000000 ) .

The code currently inserts the primaryField, e.g., rdfs:label or what have you 
and then TextQueryPF binds the hit value from Lucene to the ?lit by looking up 
the matching field value in the result doc returned by Lucene; however, the 
change I suggested no longer defaults to the primaryField and so there’s an 
error during the result binding handling in TextQueryPF.

The basic problem is that there’s an ambiguity with:

    … text:query ( “some query string” 3000000 ) .

The current code doesn’t know whether there are fields mentioned in the query 
string or not. 

If there are fields in the query string then the use of the

    (?s ?score ?lit) text:query …

form must be disallowed since there’s no way to know what field value to 
retrieve from the Lucene query result documents without further analysis of the 
query string. Apparently in your application there will generally be two or 
more matching fields in each result document and it would be further 
complicated to figure out what matching field value to use - or invent another 
syntax from grabbing more than a single ?lit per result doc.

If there are no fields mentioned in the query string then the primaryField 
should be used explicitly and then ?lit can be bound to an appropriate match 
value as currently.

Perhaps you can raise a Jena issue and we can discuss and see what can be done.

Regards,
Chris



> On Sep 1, 2019, at 2:25 PM, Chris Tomlinson <[email protected]> 
> wrote:
> 
> Hi Brian,
> 
>> On Sep 1, 2019, at 7:17 AM, Brian McBride <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> It used to be the case that JenaText supported querying of a Lucene text 
>> index where the index was created independently of Jena and then made 
>> available to JenaText via the dataset configuration.  Is this still the case?
> 
> That should still be the case, with the proviso that currently the fields 
> names be handled via RDF properties outside the query string.
> 
> As you noted, it has been documented since 3.6.0 
> <https://jena.apache.org/documentation/query/text-query.html> that:
> 
>> No explicit use of Fields within the query string is supported.
> 
> This is based on the assumption that the indexes contain only a single 
> property field in the documents as they are indexed and hence only a single 
> field corresponding to an RDF property in a query. Evidently a poor 
> assumption not caught until now.
> 
> 
>> Up until Jena 3.9.0 definitely, and I suspect 3.12.0 - I have not confirmed 
>> this yet, it was possible to express text queries with field names and they 
>> worked.
> 
> You’re correct, the change was introduced 
> <https://github.com/apache/jena/blob/519c129ab2dfcb5eb43f1a337c618a8e69f88acd/jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java#L744>
>  in the 3.13.0 code that breaks the previous behavior. I’m not able to 
> explore fixing this for the next three weeks but may take a look at “fixing” 
> this then. The basic change would be to replace the referenced line by:
> 
>     qstring = qs;
> 
> and that should be it. The results handling ( in simpleResults 
> <https://github.com/apache/jena/blob/519c129ab2dfcb5eb43f1a337c618a8e69f88acd/jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java#L562>
>  and highlightResults 
> <https://github.com/apache/jena/blob/519c129ab2dfcb5eb43f1a337c618a8e69f88acd/jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java#L668>)
>   should need no changes since Lucene:
> 
>     doc.get(null) 
> 
> just returns null  which is already handled. Evidently your application 
> doesn’t use the
> 
>      (?s ?score ?lit) text:query … 
> 
> form, since there’s no information about what fields have been used in the 
> queryString no bindings for ?lit can be made.
> 
>> We needed an index where multiple properties of the same resource were 
>> indexed as a single document.  I would be happy to discuss this further - 
>> why the solution indicated in the JenaText documentation didn't work for us 
>> and whether there is way to construct a general purpose JenaText solution 
>> that would. 
> 
> 
> More explanation would be interesting.
> 
> Sorry for the inconvenience,
> Chris
>

Re: JenaText: support for explicit field names in text queries

Reply via email to