Re: Indexing RDF graphs and inspecting in-memory RDF indexing with Jena

A. Soroka Mon, 23 Nov 2015 07:17:31 -0800

Just as a point of detail, the structure is S -> P -> Set[O] (or some other 
ordering of tuple-slots) and there is a quad version of the same scheme. Also, 
there are a couple of abstract types (TripleTable and QuadTable) which should 
make clear the intentions of those designs.
 
I wonder if the question doesn’t point to the difference in spirit between Java 
(OO) and Haskell (functional). The kinds of details (i.e. particular data 
structures) about which Rob Stewart asks are usually intentionally out-of-focus 
in Java (encapsulation) but are the center of attention for functional work. 
None of the Jena core types offer much in the way of “self-inspecting" methods. 
I suppose there is Graph::getCapabilities and the like, but that’s not really 
the same…


> A better design would be for the user to choose the graph structure in memory 
> that reflects how the triples are indexed, perhaps in line with some 
> application specific needs about how the RDF graph should be searched. For 
> example, indexed on SP keys mapping to O, or SO mapping to P, or OP mapping 
> to S, or S mapping to O, and so on.

It seems to me that the Java-idiomatic way to offer this flexibility is to 
provide abstract types / interfaces with alternative implementations, as Jena 
does, not to provide some way of swapping structures in and out of a given 
type. That’s more of a functional approach. I suppose one could use generics to 
do something like that, but I suspect it wouldn’t be very obvious to a lot of 
Java readers. There are literally dozens of partial or complete implementations 
for Graph, for example.

---
A. Soroka
The University of Virginia Library

> On Nov 21, 2015, at 12:26 PM, Rob Stewart <[email protected]> wrote:
> 
> On 21 November 2015 at 17:14, Andy Seaborne <[email protected]> wrote:
> 
> which uses a java port of the Scala immutable collections.
>> 
>> The triple indexing is SPO, POS, OSP
>> 
>> i.e.
>> SPO = Map S to a map of P to O.
> 
> 
> This is exactly what I was looking for. It's this kind indexing pattern
> that I want to abstract into SPO, POS and OSP instances of the RDF type
> class. Thanks!
> 
> --
> Rob
> 
> 
>> 
>>> Kind regards,
>>> Lorenz
>>> 
>>> [1] https://jena.apache.org/documentation/javadoc/jena/
>>> 
>>> Hi Andy,
>>>> 
>>>> On 21 November 2015 at 16:12, Andy Seaborne <[email protected]> wrote:
>>>> 
>>>> Graph in Jena is an interface (in Haskell, the type class presumably).
>>>> 
>>>> 
>>>> Yes, it sounds like Jena's Graph interface is similar to our RDF type
>>>> class.
>>>> 
>>>> http://hackage.haskell.org/package/rdf4h-1.3.4/docs/Data-RDF.html#t:RDF
>>>> 
>>>> With some simple query functions overloaded on all instances:
>>>> 
>>>> http://hackage.haskell.org/package/rdf4h-1.3.4/docs/Data-RDF-Query.html
>>>> 
>>>> 
>>>> The only way to indirectly inspect the indexing is to find the class of
>>>>> the implementation of the Graph interface.
>>>>> 
>>>> 
>>>> How many implementations of Jena's Graph interface come bundled with the
>>>> Jena code base? Where can I find out about each Graph implementation,
>>>> namely on how they are indexing the graph?
>>>> 
>>>> --
>>>> Rob
>>>> 
>>>> 
>>>> On 20/11/15 14:54, Rob Stewart wrote:
>>>>> 
>>>>> Hi,
>>>>>> 
>>>>>> I maintain an RDF Haskell library, and I would like to look towards
>>>>>> Jena
>>>>>> for inspiration on improving the API.
>>>>>> 
>>>>>> Currently,  there are two RDF graph implementations in the library. 1)
>>>>>> storing the triples just as a list of (subject,predicate,object)
>>>>>> tuple of
>>>>>> node elements, and 2) storing as a map from subject to predicate
>>>>>> lists and
>>>>>> then for each predicate a map from predicate to object list. The
>>>>>> instance
>>>>>> names in the API for the RDF type class is not very intuitive to the
>>>>>> RDF
>>>>>> domain expert. Here are two use case examples:
>>>>>> 
>>>>>> Right (rdf :: TriplesGraph) <- parseFile NTriplesParser "my_file.nt"
>>>>>> Right (rdf :: MGraph) <- parseFile NTriplesParser "my_file.nt"
>>>>>> 
>>>>>> One might ask: what is the internal structure of `TriplesGraph` and
>>>>>> `MGraph`, it certainly isn't clear from their names. A better design
>>>>>> would
>>>>>> be for the user to choose the graph structure in memory that
>>>>>> reflects how
>>>>>> the triples are indexed, perhaps in line with some application specific
>>>>>> needs about how the RDF graph should be searched. For example,
>>>>>> indexed on
>>>>>> SP keys mapping to O, or SO mapping to P, or OP mapping to S, or S
>>>>>> mapping
>>>>>> to O, and so on.
>>>>>> 
>>>>>> Where should I be looking in the Jena API, to find out what the API
>>>>>> design
>>>>>> is for providing Java programmers the ability to A) index a graph
>>>>>> whilst
>>>>>> it
>>>>>> is being populated with triples whilst parsing a source, and B) how to
>>>>>> index an already populated RDF graph? Does the Jena API allow the
>>>>>> programmer to inspect the indexing that has been applied to an RDF
>>>>>> graph
>>>>>> in
>>>>>> memory? E.g. can I find out whether an RDF graph in-memory is
>>>>>> indexed on
>>>>>> SO
>>>>>> mapping to P? If so, is this reflected by the instantiated class
>>>>>> holding
>>>>>> the data, e.g. (myGraph instanceof SOtoPGraph), or is it reflected by
>>>>>> method calls, e.g. bool indexedBySO(myGraph), or is it not possible to
>>>>>> inspect previous indexing routines on an in-memory RDF graph with Jena?
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> --
>>>>>> Rob Stewart
>>>>>> 
>>>>>> 
>>>>>> 
>>

Re: Indexing RDF graphs and inspecting in-memory RDF indexing with Jena

Reply via email to