I'm wondering how to best issue SPARQL queries when the data is structured as follows.
I'm seeing some RDF data sets where some of the properties name other RDF models that contain other data. For example, the OSLC property oslc_cm:cmServiceProviders has as its object an rdf:resource that is interpreted as a URL to fetch another model. The PubChemRDF property http://rdf.ncbi.nlm.nih.gov/pubchem/descriptor/CID2244_Canonical_SMILES has as its object an rdf:resource that is interpreted as a URL to fetch another RDF model that has the molecular structure of the substance (which has a text string formula rather than a true RDF graph, but you get the idea). My understanding is that a FROM clause is used to list multiple models that are collectively subjected to a single SPARQL query -- correct? But what if I don't know them all in advance? All I can think of is to do a query to get the list, then have code generate a new query, but there may be a whole lot of those. As an example, PubChem has millions of substances, each of which would have to be fetched in order to get the list of URLs for all the molecule structure RDF models. I am vaguely concerned the performance of that might not be as good as issuing a single query to a single dataset having all structures to find a hand-full of molecules with some rare substructure. Is there a cleaner way to handle this sort of thing? Any thoughts or suggestions would be welcome.
