Hi!

Currently jena-text stores only two things about the indexed resources: their URI, and the literal values of the indexed properties that it has been configured to look for.


This means that later on it is impossible to limit the text:query results by language. For example, when searching in a multilingual dataset, you can search for { ?s text:query "gift" }, and then get results like this:

ex:Gift rdfs:label "gift"@en .
ex:Poison rdfs:label "Gift"@de .

I'd like to have a way of restricting the hits by language tag at text:query time, e.g. using the syntax { ?s text:query "gift"@en }.

But with the current index structure this is impossible. Is there a way to easily implement this? For example, there could be separate fields for each language, so the index could have fields like uri, text_en, text_de. Then you could search either using the above syntax (with language tag in the query literal) or explicitly as { ?s text:query "text_en:gift" }.


Another similar problem is that the jena-text index is shared for all named graphs. So if there are different resources in the named graphs, you cannot match just one of the graphs but instead you will get matches for all of them mixed up, which could be many more than what you are interested in.

I'm not entirely sure how to improve on the situation, as "being" in a specific named graph is a triple-level property and the same resource could potentially be described in many named graphs. However, I think it could still be possible to add e.g. a "graph" field into the index listing all the named graphs in which the resource has been mentioned (in the triples that affect the index). Then you could query e.g. like this: { ?s text:query "text:gift graph:http://example.com/mygraph"; }. Do you think this would be a workable idea?


If you think either of these ideas is sound, I'm willing to write patches to implement these. I develop an application [1] that makes heavy use of jena-text, named graphs, and multilingual RDF data, and currently its performance is limited by these issues.

-Osma


[1] http://code.google.com/p/onki-light/

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Reply via email to