Thanks! Loading a restructured database locally is all I could think
of, nice to know I'm not unaware of an obvious better approach.
On the subject of TDB, the online documentation seems more thorough for
TDB than TDB2. From the user's perspective, are the command line and
API interfaces essentially the same?
Which is preferred from a scalability perspective, TDB or TDB2? I expect
to primarily be doing queries, with few if any updates.
On 12/5/2020 1:06 PM, Andy Seaborne wrote:
Hi Steve,
On 04/12/2020 16:09, Steve Vestal wrote:
I'm wondering how to best issue SPARQL queries when the data is
structured as follows.
I'm seeing some RDF data sets where some of the properties name other
RDF models that contain other data. For example, the OSLC property
oslc_cm:cmServiceProviders has as its object an rdf:resource that is
interpreted as a URL to fetch another model. The PubChemRDF property
http://rdf.ncbi.nlm.nih.gov/pubchem/descriptor/CID2244_Canonical_SMILES
has as its object an rdf:resource that is interpreted as a URL to fetch
another RDF model that has the molecular structure of the substance
(which has a text string formula rather than a true RDF graph, but you
get the idea).
My understanding is that a FROM clause is used to list multiple models
that are collectively subjected to a single SPARQL query -- correct?
Yes - for a general dataset, they are loaded from the web.
TDB will choose graphs from the dataset, not load from a remote site.
But what if I don't know them all in advance?
ARQ does not have a way to dynamically load graphs once the query has
started.
All I can think of is to
do a query to get the list, then have code generate a new query, but
there may be a whole lot of those. As an example, PubChem has millions
of substances, each of which would have to be fetched in order to get
the list of URLs for all the molecule structure RDF models.
I am
vaguely concerned the performance of that might not be as good as
issuing a single query to a single dataset having all structures to find
a hand-full of molecules with some rare substructure.
> Is there a cleaner way to handle this sort of thing? Any thoughts or
suggestions would be welcome.
Maybe better to get all the graphs ahead of time and load one local
database? Especially, if you are doing lots of queries over a period
of time. Remote loading is not quick if it has to go out to the web
and the loading isn't persistent.
The TDB bulk loaders load from URLs.
That said, for practical reasons, it is often better to download the
files, check them for syntax then load them. Nothing worse than a bulk
loader encountering an error part way through.
Andy