JenaText: support for explicit field names in text queries

Brian McBride Sun, 01 Sep 2019 05:18:15 -0700

The topic of this email was first mentioned in the recent thread aboutJENA-1620 and Query Timeouts but is really a separate topic andpotentially gets a little complicated.

It used to be the case that JenaText supported querying of a Lucene textindex where the index was created independently of Jena and then madeavailable to JenaText via the dataset configuration. Is this still thecase?

Up until Jena 3.9.0 definitely, and I suspect 3.12.0 - I have notconfirmed this yet, it was possible to express text queries with fieldnames and they worked.

We have a Fuseki system in production (5+ years) that has "its ownmechanism"* for building a multi-field lucene index that is then queriedusing JenaText. Those queries specify lucene field names as in theexample I gave in the earler thread:


[[

PREFIX  xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX  text: <http://jena.apache.org/text#>
PREFIX  ppd: <http://landregistry.data.gov.uk/def/ppi/>
PREFIX  lrcommon: <http://landregistry.data.gov.uk/def/common/>
SELECT *  {
  ?ppd_propertyAddress
      text:query            ( "street:  the" 3000000 ) .
} LIMIT 1

]]

You can try it on a system running Fuseki 3.9.0  here:

http://landregistry.data.gov.uk/app/qonsole

In a recent test with Jena 3.13.0-SNAPSHOT (from pull request #595)installed in the dev version of that system, the query fails with aquery parse error. Do I need to do some extra configuration to get thisto work - e.g. specify a specific text query parser?


Rereading the Jena Text documentation I find:

[[

As mentioned earlier, the text index uses thenative Lucene querylanguage<http://lucene.apache.org/core/6_4_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description>;however, there are important constraints on how the Lucene querylanguage is used within jena-text. In particular,/explicit/references toLucene|Fields|with the|query string|*are not*supported. So how areLucene queries that would otherwise refer to multiple|Fields|expressed?

]]

The text goes on to explain the issues around the fact that JenaTextindexes each triple/quad as a separate document.

I have couched my question so far in terms of querying an externallybuilt text index, because that is the simplest, and possibly mostcompelling way to ask the question and suggest not disallowing the useof lucene fields in text queries. Not supported for creating indexes isnot the same as not supported for querying indexes.

I am (naively?) hoping that restoring the functionality to allowspecifying lucene field names in a text query is a quick fix for someonefamiliar with the code. I am not familiar with the code, but am willingto help where I can.

In the interests of full disclosure however, I should say that thereason we have our own mechanism for building the text index is exactlythe one given in the JenaText documentation. We needed an index wheremultiple properties of the same resource were indexed as a singledocument. I would be happy to discuss this further - why the solutionindicated in the JenaText documentation didn't work for us and whetherthere is way to construct a general purpose JenaText solution thatwould. But there is a lot of potential for complexity there - and thegears for a new Jena release are beginning to turn and I have beenhoping to deploy this new release when it becomes available.


Brian

* in fact we use JenaText with a custom TextDocProducer implementation.

--
------------------------------------------------------------------------

Brian McBride
[email protected]

Epimorphics Ltd www.epimorphics.com
Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Tel: 01275 399069

Epimorphics Ltd. is a limited company registered in England (number 7016688)

Registered address: Court Lodge, 105 High Street, Portishead, BristolBS20 6PT, UK

JenaText: support for explicit field names in text queries

Reply via email to