On 12/01/2020 21.50, Chris Tomlinson wrote:
Hi Mikael,
On Jan 10, 2020, at 4:26 AM, Mikael Pesonen <[email protected]> wrote:
Hi Chris,
On 09/01/2020 17.50, Chris Tomlinson wrote:
Hello Br,
On Jan 9, 2020, at 3:34 AM, Mikael Pesonen <[email protected]> wrote:
Hi,
I asked about these few years ago so maybe there is some new ideas.
1) Is it possible to config text index so that it would add, for example, all
textual values (xsd:string etc) to index automatically? Now every property has
to be configured manually.
No it is not currently possible. Perhaps more detail on how you would see using such
a feature and how you would handle various literal datatypes (convert all to
xsd:string?) and then how would you search, currently searches are focussed on one or
more properties - a recent update allows to provide a list of properties that can be
searched in a single Lucene search. More detail is available at
https://jena.apache.org/documentation/query/text-query.html
<https://jena.apache.org/documentation/query/text-query.html>.
In ideal case all values that are of type string literal would be indexed.
Querys would work as now, you would define the properties you are querying, for
example
*(?concept ?score ?prefLabel) text:query (skos:prefLabel "tech*" "lang:en") Of
course I don't know how hard this would be to implement. *
So, you're wanting objects of type xsd:string and rdf:langString to be indexed
with the property/predicate appearing in the triple. This in turn would mean
that a field name would need to be created based on the resource localName of
the property and for rdf:langString a default lang field name would need to be
defined in the assembler file along with whatever multi-language analyzer
structure is needed. This is tantamount to creating the entmap for the Lucene
index configuration on-the-fly.
I'm not quite sure what resource localName and entmap mean but this
would be ideal yes.
Reason for this is that we are providing our customers a file/metadata
service so we don't have info on what metadata is inputted. For that
reason we are using external Lucene index now and that is a bit of hassle.
2) Is there planned support for searching similar resources, based on the
Lucene index?
I’m not aware of any such plans. More detail would be needed to evaluate
feasibility, in particular how to recognize resources as similar.
Please note that the Jena+Lucene model is to index individual triples as Lucene
documents not entire graphs or models which in turn leads to indexing and
searching focussed on properties.
This would be fine. At least for our needs it would enough to find similar
values only, not entire resources.
I’m sorry I still don’t know what constitutes "similar values”. I’m guessing you’re
referring to using Lucene fuzzy matches, proximity matches and the like. These are already
supported to an extent (see Jena Full Text Search
<https://jena.apache.org/documentation/query/text-query.html>).
This sort of thing would not be released until Jena 3.15 at the earliest. I
haven’t given any implementation thought to this other than what’s written here.
Regards,
Chris
Chris
Br
--
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's
Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: [email protected]
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's
Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: [email protected]
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND