Hi Chris,

On 13/01/2020 20.48, Chris Tomlinson wrote:
Hi Mikael,

On Jan 13, 2020, at 3:30 AM, Mikael Pesonen <[email protected]> wrote:

So, you're wanting objects of type xsd:string and rdf:langString to be indexed 
with the property/predicate appearing in the triple. This in turn would mean 
that a field name would need to be created based on the resource localName of 
the property and for rdf:langString a default lang field name would need to be 
defined in the assembler file along with whatever multi-language analyzer 
structure is needed. This is tantamount to creating the entmap for the Lucene 
index configuration on-the-fly.
I'm not quite sure what resource localName and entmap mean but this would be 
ideal yes.

Reason for this is that we are providing our customers a file/metadata service 
so we don't have info on what metadata is inputted. For that reason we are 
using external Lucene index now and that is a bit of hassle.
The localName of a resource URI, e.g., skos:prefLabel, is “prefLabel”. The entmap is 
discussed 
<https://jena.apache.org/documentation/query/text-query.html#entity-map-definition> 
in the Jena Full Text Search 
<https://jena.apache.org/documentation/query/text-query.html> documentation. The 
entmap associates an RDF property localName with a field in a Lucene document. This is what 
would be needed to use text:search to find triples. I.e., Lucene needs to know what field 
to search over for a given property.

I’m still not seeing an answer regarding what constitutes "similar values” so I 
can’t respond to that.
About similar: it would be fine if it would be possible to find similar triple values. We are storing documents as plain text into a single value and would like to find the similar values.

Please use the Jena issue tracker <https://issues.apache.org/jira/browse/JENA> and 
open an issue for the feature you’re proposing and refer to the Jena Full Text Search 
<https://jena.apache.org/documentation/query/text-query.html> for information about 
what is currently supported and what configuration capabilities are provided.
Okay I'll open issues for both. Thanks!

Thank you,
Chris









--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: [email protected]
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND

Reply via email to