Sorry for the delay

On 25/02/13 12:42, Dick Murray wrote:
Questions regarding the memory footprint and the
SPARQL_QueryGeneral.MaxTriples (100*1000) which is final static int.

To be clear - this is only used when loading data as part of FROM/FROM NAMED. In that case, an in-memory graph/dataset is used. And reading in data just to query it and throw it away is quite time-consuming.

The execute method on SPARQL_Query calls decideDataset(action, query,
queryStringLog) to return the Dataset against which to execute the query.

This in turn builds a Dataset from a DatsesetDescription which loops
through graphURLs and namesGraphs. Each iteration creates a default Model
(in memory) and loads in triples using a SinkTriplesToGraph via the
RiotReader. I'm assuming this uses the sink send method (I got lost in the
interface when tracing the hierachy)..?

Yes


I'm assuming that the graphs/triples aren't duplicated? But there is a
overhead as the triples are "sinked"?

No (not duplicated)

It should not be be an overhead - it's one extra method call on each triple.

As of Fuseki 0.2.6, this is now a StreamRDF, much the same thing as a sink but it models the output of parsers better. All parsing now outputs via a StreamRDF and there are a myriad of implementations from ones that put the output in graph to ones that print directly (so you can have streaming parse-to-print).

GraphLoadUtils.readUtil comes down to:


  Lang lang = RDFLanguages.filenameToLang(uri, RDFLanguages.RDFXML) ;
  StreamRDF sink = StreamRDFLib.graph(graph) ;
  sink = new SinkRDFLimited(sink, limit) ;

  InputStream input = Fuseki.webStreamManager.open(uri) ;
  RDFDataMgr.parse(sink, input, uri, lang, null) ;

and "RDFDataMgr.parse(StreamRDF" is the core that drives allparsing operations nowadays (Jena 2.10.0)

The SinkLimited class uses the MaxTriples value and throws a
RiotException("Limit "+limit+" exceeded") from the send(T thing) method.
How do I get around this?

Typically by loading the data via SPARQL update e.g. LOAD or the upload operations, or offline, into a database.

100K for reading in a one time use dataset is not something to be done lightly.

What's the use case here?


TDB, with FROM/FROM NAMED, works differently.

It uses the graphs named to construct an execution that only applies to those graphs. The graphs come from the local database and are not copied.

        Andy


Dick.


Reply via email to