Hello Sören,

First of all, placing the SQL query into SPARQL is required less
frequently than it seems to be. Virtuoso's optimizer does not treat
subquery as an instruction to preserve the order. As a result,

PREFIX : <http://people.example/>
SELECT ?y ?name ?age WHERE {
   :alice :knows ?y .
   {
     SQL SELECT y, name, age FROM people
   }
}

and

select q1."y", q2.name, q2.age
from (sparql
  PREFIX : <http://people.example/> 
  SELECT ?y WHERE { :alice :knows ?y } ) as q1.
  people as q2
where q2.y=q1."y"

will probably produce equivalent execution plans.

However, native SQLs could be very useful (and sometimes unavoidable) in nested 
scalar subqueries, e.g. in FILTER(bif:exists(...))
or in
SPARQL select
 (select ...) as ?calculated-value1
 (select ...) as ?calculated-value-2
where ...

The problem is that the sparql compiler may copy the content of the sql 
subquery into many places of the resulting big sql query.
If the subquery refers to some aliases then multiple copies of these aliases 
may cause weird SQL compilation errors.
Even worse, the subquery may omit aliases making things more obfuscating.
Finally, the subquery will probably refer to variables bound in surrounding 
SPARQL so that variables should be recognized in the text of the query and 
appropriate aliases should be imprinted before them.

A dirty hack is to write an RDF view that creates triples using tables, joins 
and filter conditions from the SQL query in question.
That will make the optimizer happy and provide best possible SQL code but it's 
next to unusable if there are many different subqueries.
Or better write an RDF view that creates triples using tables and maybe some 
joins from the SQL query in question but place filters and remaining joins into 
SPARQL query over that view.
That's more flexible and the quality of the generated code will stay good.

However the RDF View is a bad choice if an SQL view should be used as a source, 
or not applicable at all if the SQL view is actually procedure view.
So I should think what could be done.
Right now SPARQL compiler is a preprocessor at the front of the compiler.
To handle arbitrary SQL subqueries, the SQL processor should be divided in 
parts, so there will be an SQL+SPARQL preprocessor, then SPARQL processor then 
core of the SQL compiler, not a small change.
As a variant, the SQL inside SPARQL will contain special easily recognizable 
syntax extensions to refer to variables of surrounding SPARQL (and variables of 
SQL code around surrounding SPARQL).

I will discuss the issue with others and return to this topic after the release 
that will contain SPARQL 1.1 extensions.

Best Regards,
Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com


On Wed, 2010-07-14 at 13:55 +0200, Sören Auer wrote:
> Hi,
> 
> For some experiments we plan to run it would be very useful to embed an 
> SQL query as a subquery inside a SPARQL query. We want to combine 
> relational with RDF data for example in the following way:
> 
> Lets assume we have foaf profiles in the triple store and a relational 
> table with information about people (e.g. from a CRM system). A query 
> similar to the following would be really useful in that case:
> 
> PREFIX : <http://people.example/>
> SELECT ?y ?name ?age WHERE {
>    :alice :knows ?y .
>    {
>      SQL SELECT y, name, age FROM people
>    }
> }
> 
> Instead of joining with the triple table a join with the SQL subquery 
> would occur and SPARQL variables would be matched against columns of the 
> SQL result set with the same name.
> 
> Does this make sense? Are there already plans to implement something 
> along these lines in Virtuoso? I think this functionality would 
> dramatically simplify a number of data integration tasks and be a 
> wonderful USP for Virtuoso.
> 
> Best,
> 
> Sören



Reply via email to