Re: Fuseki understanding the dataset within a query.

Andy Seaborne Fri, 01 Mar 2013 00:24:11 -0800

Sorry for the delay

On 25/02/13 12:42, Dick Murray wrote:

Questions regarding the memory footprint and the
SPARQL_QueryGeneral.MaxTriples (100*1000) which is final static int.

To be clear - this is only used when loading data as part of FROM/FROMNAMED. In that case, an in-memory graph/dataset is used. And readingin data just to query it and throw it away is quite time-consuming.

The execute method on SPARQL_Query calls decideDataset(action, query,
queryStringLog) to return the Dataset against which to execute the query.

This in turn builds a Dataset from a DatsesetDescription which loops
through graphURLs and namesGraphs. Each iteration creates a default Model
(in memory) and loads in triples using a SinkTriplesToGraph via the
RiotReader. I'm assuming this uses the sink send method (I got lost in the
interface when tracing the hierachy)..?

Yes


I'm assuming that the graphs/triples aren't duplicated? But there is a
overhead as the triples are "sinked"?


No (not duplicated)

It should not be be an overhead - it's one extra method call on each triple.

As of Fuseki 0.2.6, this is now a StreamRDF, much the same thing as asink but it models the output of parsers better. All parsing nowoutputs via a StreamRDF and there are a myriad of implementations fromones that put the output in graph to ones that print directly (so youcan have streaming parse-to-print).


GraphLoadUtils.readUtil comes down to:


  Lang lang = RDFLanguages.filenameToLang(uri, RDFLanguages.RDFXML) ;
  StreamRDF sink = StreamRDFLib.graph(graph) ;
  sink = new SinkRDFLimited(sink, limit) ;

  InputStream input = Fuseki.webStreamManager.open(uri) ;
  RDFDataMgr.parse(sink, input, uri, lang, null) ;

and "RDFDataMgr.parse(StreamRDF" is the core that drives allparsingoperations nowadays (Jena 2.10.0)

The SinkLimited class uses the MaxTriples value and throws a
RiotException("Limit "+limit+" exceeded") from the send(T thing) method.
How do I get around this?

Typically by loading the data via SPARQL update e.g. LOAD or the uploadoperations, or offline, into a database.

100K for reading in a one time use dataset is not something to be donelightly.


What's the use case here?


TDB, with FROM/FROM NAMED, works differently.

It uses the graphs named to construct an execution that only applies tothose graphs. The graphs come from the local database and are not copied.


        Andy


Dick.

Re: Fuseki understanding the dataset within a query.

Reply via email to