Hi Rob,
In Jena, the specific requirements on storing and indexing are made by
the choice of storage. This is independent of the parsing process
because parsers output a stream of triples or quads to a StreamRDF
supplied to the parser run.
This StreamRDF can be code that sends triples to a graph, a dataset, a
printer, just something that counts, whatever.
Graph in Jena is an interface (in Haskell, the type class presumably).
The only way to indirectly inspect the indexing is to find the class of
the implementation of the Graph interface. But normally the choice is
made by chosen destination when the triples are stored.
Andy
On 20/11/15 14:54, Rob Stewart wrote:
Hi,
I maintain an RDF Haskell library, and I would like to look towards Jena
for inspiration on improving the API.
Currently, there are two RDF graph implementations in the library. 1)
storing the triples just as a list of (subject,predicate,object) tuple of
node elements, and 2) storing as a map from subject to predicate lists and
then for each predicate a map from predicate to object list. The instance
names in the API for the RDF type class is not very intuitive to the RDF
domain expert. Here are two use case examples:
Right (rdf :: TriplesGraph) <- parseFile NTriplesParser "my_file.nt"
Right (rdf :: MGraph) <- parseFile NTriplesParser "my_file.nt"
One might ask: what is the internal structure of `TriplesGraph` and
`MGraph`, it certainly isn't clear from their names. A better design would
be for the user to choose the graph structure in memory that reflects how
the triples are indexed, perhaps in line with some application specific
needs about how the RDF graph should be searched. For example, indexed on
SP keys mapping to O, or SO mapping to P, or OP mapping to S, or S mapping
to O, and so on.
Where should I be looking in the Jena API, to find out what the API design
is for providing Java programmers the ability to A) index a graph whilst it
is being populated with triples whilst parsing a source, and B) how to
index an already populated RDF graph? Does the Jena API allow the
programmer to inspect the indexing that has been applied to an RDF graph in
memory? E.g. can I find out whether an RDF graph in-memory is indexed on SO
mapping to P? If so, is this reflected by the instantiated class holding
the data, e.g. (myGraph instanceof SOtoPGraph), or is it reflected by
method calls, e.g. bool indexedBySO(myGraph), or is it not possible to
inspect previous indexing routines on an in-memory RDF graph with Jena?
Thanks!
--
Rob Stewart