Hi Chris,
On 13/01/2020 20.48, Chris Tomlinson wrote:
Hi Mikael,
On Jan 13, 2020, at 3:30 AM, Mikael Pesonen <[email protected]> wrote:
So, you're wanting objects of type xsd:string and rdf:langString to be indexed
with the property/predicate appearing in the triple. This in turn would mean
that a field name would need to be created based on the resource localName of
the property and for rdf:langString a default lang field name would need to be
defined in the assembler file along with whatever multi-language analyzer
structure is needed. This is tantamount to creating the entmap for the Lucene
index configuration on-the-fly.
I'm not quite sure what resource localName and entmap mean but this would be
ideal yes.
Reason for this is that we are providing our customers a file/metadata service
so we don't have info on what metadata is inputted. For that reason we are
using external Lucene index now and that is a bit of hassle.
The localName of a resource URI, e.g., skos:prefLabel, is “prefLabel”. The entmap is
discussed
<https://jena.apache.org/documentation/query/text-query.html#entity-map-definition>
in the Jena Full Text Search
<https://jena.apache.org/documentation/query/text-query.html> documentation. The
entmap associates an RDF property localName with a field in a Lucene document. This is what
would be needed to use text:search to find triples. I.e., Lucene needs to know what field
to search over for a given property.
I’m still not seeing an answer regarding what constitutes "similar values” so I
can’t respond to that.
About similar: it would be fine if it would be possible to find similar
triple values. We are storing documents as plain text into a single
value and would like to find the similar values.
Please use the Jena issue tracker <https://issues.apache.org/jira/browse/JENA> and
open an issue for the feature you’re proposing and refer to the Jena Full Text Search
<https://jena.apache.org/documentation/query/text-query.html> for information about
what is currently supported and what configuration capabilities are provided.
Okay I'll open issues for both. Thanks!
Thank you,
Chris
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's
Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: [email protected]
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND