Re: Using Jena text search with all predicates

Mikael Pesonen Fri, 25 Jan 2019 03:13:41 -0800


Hi Osma,

I didn't think it through. I guess we can do it already by manuallyrunning the query and then copying the result list to Fuseki config.


On 25/01/2019 09:42, Osma Suominen wrote:

Hi Mikael!
Thanks for the idea. However, I don't quite see the picture. Wherewould you put that user defined SPARQL query? In the Jena assemblerconfiguration file? How is that better than just defining theproperties in that same file?
Also, jena-text needs to know a bit more than just the list ofproperties to index. Each property has to be configured with fieldname (in the index) and possibly an analyzer too if you want tooverride the analyzer setting for the index.
But since the Jena assembler configuration is just RDF triples, youcould actually generate it (or perhaps just the entity map part) usinga SPARQL CONSTRUCT query. Store the result in a file and include it aspart of your configuration.
Also see the last part of my message that you quoted: if you thinksomething in Jena (or jena-text) needs to be changed, please open anissue on JIRA and submit a pull request.
-Osma

Mikael Pesonen kirjoitti 23.1.2019 klo 14.09:
Hi, sorry to bring up old discussion.
One, maybe ideal for us, solution would be to index all properties(in addition to explicitly configured) that are result of a userdefined sparql query.We would run a query on RDF schemas to get all string properties(property's range is xsd:string) etc.
On 28/02/2018 20:16, Osma Suominen wrote:
Hi Jim!
Your observation is correct. jena-text only indexes the RDFproperties you have explicitly configured. The configuration foreach property may be different. There is no wildcard setting thatwould cover all possible properties.
The thinking behind this is that for typical use cases of a textindex, there is a fairly limited set of properties that may berelevant (e.g. rdfs:label, rdfs:comment, dc:title, dc:description,skos:prefLabel, skos:altLabel, schema:name) and indexing everypossible property would just bloat the index. Other literal valuesare still in the triple store and can be searched (possiblyinefficiently) using SPARQL features such as FILTER with e.g. aREGEX or CONTAINS function.
If you think that e.g. a wildcard property setting would be auseful, please open an issue in the Apache Jena JIRA(https://issues.apache.org/jira/projects/JENA/issues). Also, patchesand pull requests welcome!
-Osma


McCusker, James Patrick kirjoitti 28.02.2018 klo 19:23:
From what I can tell in the documentation, we have to configureJena text to index a fixed set of predicates. The examples giverdfs:label, and from what I see I can add more, but there are a lotof potential properties in the world. Is there a way to simplyindex all predicates into a field? It seems strange that I wouldhave to enumerate over the tons of text predicates that are used inthe world in order to do a proper *full* text search of my graph.
This is a capability that is covered by other SPARQLimplementations (Blazegraph, Virtuoso).
Theoretically, the predicate should just be another field in thelucene document that can be filtered on, like with graph.
Thanks,
Jim

Jim McCusker, Ph.D.
[email protected]
http://tw.rpi.edu/web/person/JamesMcCusker
Director, Data Operations
Tetherless World Constellation
Department of Computer Science
Rensselaer Polytechnic Institute


--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: [email protected]
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND

Re: Using Jena text search with all predicates

Reply via email to