Hi!
Currently jena-text stores only two things about the indexed resources:
their URI, and the literal values of the indexed properties that it has
been configured to look for.
This means that later on it is impossible to limit the text:query
results by language. For example, when searching in a multilingual
dataset, you can search for { ?s text:query "gift" }, and then get
results like this:
ex:Gift rdfs:label "gift"@en .
ex:Poison rdfs:label "Gift"@de .
I'd like to have a way of restricting the hits by language tag at
text:query time, e.g. using the syntax { ?s text:query "gift"@en }.
But with the current index structure this is impossible. Is there a way
to easily implement this? For example, there could be separate fields
for each language, so the index could have fields like uri, text_en,
text_de. Then you could search either using the above syntax (with
language tag in the query literal) or explicitly as { ?s text:query
"text_en:gift" }.
Another similar problem is that the jena-text index is shared for all
named graphs. So if there are different resources in the named graphs,
you cannot match just one of the graphs but instead you will get matches
for all of them mixed up, which could be many more than what you are
interested in.
I'm not entirely sure how to improve on the situation, as "being" in a
specific named graph is a triple-level property and the same resource
could potentially be described in many named graphs. However, I think it
could still be possible to add e.g. a "graph" field into the index
listing all the named graphs in which the resource has been mentioned
(in the triples that affect the index). Then you could query e.g. like
this: { ?s text:query "text:gift graph:http://example.com/mygraph" }. Do
you think this would be a workable idea?
If you think either of these ideas is sound, I'm willing to write
patches to implement these. I develop an application [1] that makes
heavy use of jena-text, named graphs, and multilingual RDF data, and
currently its performance is limited by these issues.
-Osma
[1] http://code.google.com/p/onki-light/
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi