[topbraid-users] Re: Difference between import module and using a named graph in a sparql motion script

Scott Henninger Fri, 15 Apr 2011 08:26:50 -0700

Hello, Phil; By "named graphs" I take it you mean SPARQL queries using
the GRAPH keyword.  It is the case that a smf:ImportRDF… module is not
required when using the GRAPH keyword to specify where data is being
retrieved.  Some of the behavior you are seeing is probably due to the
query engine automatically federating a query.  I.e. given that data
is specified in one or more data back-end, the query engine must look
for all combinations in the triple match.  You may know that {?
x :myprop ?myvalue} occurs only in source1, but the query can't know
that and will therefore ask source2 for these matches.

To target queries to a specific datastore, use the SERVICE keyword.
This passes a query to the "endpoint" (in D2RQ it passes the query
directly to the D2RQ engine), bypassing any attempts to federate the
query across different datastores.  Use the base URI of the .d2rq
file, e.g. "SERVICE <base uri of .d2rq> {}".

<<Do I need to use the "smf:ImportRDF..." modules at all?>>

Not when using GRAPH and SERVICE in your queries.

<<- What is the difference in behavior of the smf:import modules as
compared to simply referencing a named graph? >>

An smf:import module specifying a data back end will open a new
connection to the back end, but not request any triples.  In all
cases, you should use ImportRDFFromWorkspace and specify the connector
file to the back-end in your workspace. When using SERVICE or GRAPH
the connection will be used (if specified in an import module) or a
new one will be created.

<<- Am I adding (or eliminating) any overhead or additional processing
by only using the named graphs? >>

I am not aware of any extra overhead with either approach.  There
could be differences in how the data connectors are maintained. Some
of this depends on the mechanics of the script.

<<Can I write these new
terms into the taxonomy using "INSERT INTO <graph>"?>>

Yes, this is in fact the preferred method.

<<If so, will any
newly inserted term be available when I sparql back into that data
source later in the sparqlmotion script?>>

Yes.

<<is it okay (or
even better) to write the triples into my desired graphs using the
INSERT statement rather than generating the 2 separate sets of triples
and than having to use some sort of a FILTER BY CONSTRUCT statement
before using an "smf:export..." module to write the result sets to
their separate graphs. >>

It is certainly OK, and may be better in some cases, depending some on
the set up for the back end (buffer size for batch writes, etc.)  It's
difficult to say in the abstract and some empirical work, as you have
been doing , is always necessary.

-- Scott

On Apr 15, 7:39 am, Phil <[email protected]> wrote:
> I am writing a sparqlmotion script that processes services requests
> stored in a d2rq-exposed relational database.  The script constructs a
> series of triples that assigns search terms to the service request
> based upon data attached to the request within the relational
> database.  The net result of the sparqlmotion script will be to
> generate 2 rdf data sets: (1) an rdf file containing the triples
> associating the identified search terms to the service request and (2)
> an rdf file containing the distinct list of identified search terms so
> that they can be categorized into an existing skos-based taxonomy of
> search terms.  The taxonomy represents search terms that have been
> identified across a number of data sources.
>
> I originally wrote the sparqlmotion script to import both the d2rq
> data source as well as the search term taxonomy and then execute a
> series of sparql construct statements to create the desired triples.
> However, I noticed an interesting behavior during the script execution
> in that the script was executing queries against the d2rq source for
> the taxonomy search terms (which exist in the taxonomy source and NOT
> the d2rq source).  I then modified the sparql construct statements in
> the script to use named graphs so that it would only "sparql" into
> each source based upon the data I know exists in that source (i.e.
> when querying for existing search terms, I specify the taxonomy source
> as the named graph; when querying for terms used on a service request,
> I use the d2rq source as the named graph).  This greatly improved the
> performance of the sparqlmotion script, which begged the following
> questions:
>
> - Do I need to use the "smf:ImportRDF..." modules at all?
> - What is the difference in behavior of the smf:import modules as
> compared to simply referencing a named graph?
> - Am I adding (or eliminating) any overhead or additional processing
> by only using the named graphs?
>
> I actually test this out by removing all of the smf:Import... modules
> from the script; the script simply starts off by executing the
> ApplyConstruct modules that utilized the named graphs and it appears
> to have worked just fine and performed better.
>
> Along the same lines ... when I find a new search term that I've not
> seen before, I need to write into the taxonomy under an existing
> concept in the taxonomy called "Unassigned". Can I write these new
> terms into the taxonomy using "INSERT INTO <graph>"?  If so, will any
> newly inserted term be available when I sparql back into that data
> source later in the sparqlmotion script?  My line of thinking for the
> writing of the data is similar to the questions above: is it okay (or
> even better) to write the triples into my desired graphs using the
> INSERT statement rather than generating the 2 separate sets of triples
> and than having to use some sort of a FILTER BY CONSTRUCT statement
> before using an "smf:export..." module to write the result sets to
> their separate graphs.
>
> Thanks for your assistance!
>
> Phil

-- 
You received this message because you are subscribed to the Google
Group "TopBraid Suite Users", the topics of which include TopBraid Composer,
TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
To post to this group, send email to
[email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/topbraid-users?hl=en

[topbraid-users] Re: Difference between import module and using a named graph in a sparql motion script

Reply via email to