On 26/11/13 13:30, Osma Suominen wrote:
Hi Andy!

Thanks for your response. Indeed, I hadn't realized that jena-text
indexes on the triple level - I actually thought it worked at the
entity/resource level (one Lucene/Solr document per RDF entity).
However, looking at the code, there is some code for indexing at the
entity level that but that code is unused. So it would actually be
pretty easy to add lang and/or graph fields into the index, because
those are defined on the triple level.

How about adding optional support for this into jena-text? There could
be new configuration options so you could do something like this:

<#entMap> a text:EntityMap ;
     text:entityField      "uri" ;

     text:languageField    "lang" ;

Should this be per map entry/ per predicate? I don't know which is best - whether a index-wide configuration or whether it might be some predicates are indexed one way and some another.

(and if there is no lang, presumably "") .

     text:graphField       "graph" ;
     text:defaultField     "text" ;
     text:map (
          [ text:field "text" ; text:predicate rdfs:label ]
          ) .

Without the languageField and graphField properties, there would be no
indexing of language/graph information and thus no cost in index size
compared to the current situation.

At query time, graph context information could be used to narrow the
search when it is available and a graphField is defined in the
configuration. Similarly for language, so you could do searches like
{ ?s text:query "gift lang:en" }.

Does this sound like a sane plan? If it does, I can look at trying to
implement it sometime in the next couple of months.

Sounds sane.

What would the query predicate in SPARQL look like?

If it all defaults back to the current mode of operations, we have a non-disturptive upgrade path which would better if possible. It's a change of disk-format which is always more of an issue for existing use.

        Andy


-Osma

Reply via email to