Future of Jena SDB

2013-06-07 Thread Andy Seaborne
SDB is a Jena storage module that uses SQL databases for RDF storage. 
See [1] for documentation. It uses a custom database schema to store 
RDF; it is not a general SQL-to-RDF mapping layer.


The supported databases are: Oracle, Microsoft SQL Server, DB2, 
PostgreSQL, MySQL, Apache Derby, H2, HSQLDB.  Only Derby and HSQLDB are 
tested in the development build process.


Both Oracle and IBM corporations provide commercial RDF solutions using 
Jena that are completely unrelated to SDB.


TDB is faster, more scalable and better supported than SDB but there can 
be reasons why an SQL-backed solution is appropriate.


There is no active development or maintenance of SDB from within the 
committer team; no committers use SDB and it imposes a cost to the team 
to generate separate releases.  We're not receiving patches contributed 
to JIRA items for bugs.


We are proposing:

1/ moving it into the main build so it will be part of the main 
distribution with limited testing.


2/ marking it as under review / maintenance only.

It will not be treated as something that can block a release, nor for 
any significant length of time, stop development builds.


It may be pulled from the main build, and from a release, at very short 
notice.


If moved out, the source code will still be available but no binaries 
(releases or development builds) will be produced.


What would change SDB's status is care and attention. There are ways to 
enhance it, for example, pushing the work of filters into the SQL 
database, where possible, to improve query performance.


Andy

[1] http://jena.apache.org/documentation/sdb/index.html


Re: Future of Jena SDB

2013-06-07 Thread Olivier Rossel
Could SDB be useful when dealing with GeoSPARQL and your backend is
(something like) Postgresql+Postgis?
(just a question, this is not one of my needs at the moment).


On Fri, Jun 7, 2013 at 11:29 AM, Andy Seaborne a...@apache.org wrote:

 SDB is a Jena storage module that uses SQL databases for RDF storage. See
 [1] for documentation. It uses a custom database schema to store RDF; it is
 not a general SQL-to-RDF mapping layer.

 The supported databases are: Oracle, Microsoft SQL Server, DB2,
 PostgreSQL, MySQL, Apache Derby, H2, HSQLDB.  Only Derby and HSQLDB are
 tested in the development build process.

 Both Oracle and IBM corporations provide commercial RDF solutions using
 Jena that are completely unrelated to SDB.

 TDB is faster, more scalable and better supported than SDB but there can
 be reasons why an SQL-backed solution is appropriate.

 There is no active development or maintenance of SDB from within the
 committer team; no committers use SDB and it imposes a cost to the team to
 generate separate releases.  We're not receiving patches contributed to
 JIRA items for bugs.

 We are proposing:

 1/ moving it into the main build so it will be part of the main
 distribution with limited testing.

 2/ marking it as under review / maintenance only.

 It will not be treated as something that can block a release, nor for any
 significant length of time, stop development builds.

 It may be pulled from the main build, and from a release, at very short
 notice.

 If moved out, the source code will still be available but no binaries
 (releases or development builds) will be produced.

 What would change SDB's status is care and attention. There are ways to
 enhance it, for example, pushing the work of filters into the SQL database,
 where possible, to improve query performance.

 Andy

 [1] 
 http://jena.apache.org/**documentation/sdb/index.htmlhttp://jena.apache.org/documentation/sdb/index.html



Re: Future of Jena SDB

2013-06-07 Thread Claus Stadler

Hi,

Just a quick note, as I am the developer of Sparqlify[1], which pretty 
much is a general SQL-to-RDF mapping layer (well, or at least SPARQL-SQL 
rewriter) based on Jena (it also has some dependencies to SDB).
From my experience, although I had and still have some dependencies to 
SDB, but I had to e.g. duplicate the SqlExpr hierachy (e.g. [2]) because 
I needed each SqlExpr to provide its datatype).


The old master branch of Sparqlify already had initial support for 
rewriting spatial predicates to SQL (only ST_Intersects and ST_DWithin), 
but we are currently enhancing this system to allow one to essentially 
expose any (or at least most) SQL functions as a SPARQL one.


[1] https://github.com/AKSW/Sparqlify
[2] 
https://github.com/AKSW/Sparqlify/tree/master/sparqlify-core/src/main/java/org/aksw/sparqlify/algebra/sql/exprs2


Cheers,
Claus


On 06/07/2013 11:54 AM, Olivier Rossel wrote:

Could SDB be useful when dealing with GeoSPARQL and your backend is
(something like) Postgresql+Postgis?
(just a question, this is not one of my needs at the moment).


On Fri, Jun 7, 2013 at 11:29 AM, Andy Seaborne a...@apache.org wrote:


SDB is a Jena storage module that uses SQL databases for RDF storage. See
[1] for documentation. It uses a custom database schema to store RDF; it is
not a general SQL-to-RDF mapping layer.

The supported databases are: Oracle, Microsoft SQL Server, DB2,
PostgreSQL, MySQL, Apache Derby, H2, HSQLDB.  Only Derby and HSQLDB are
tested in the development build process.

Both Oracle and IBM corporations provide commercial RDF solutions using
Jena that are completely unrelated to SDB.

TDB is faster, more scalable and better supported than SDB but there can
be reasons why an SQL-backed solution is appropriate.

There is no active development or maintenance of SDB from within the
committer team; no committers use SDB and it imposes a cost to the team to
generate separate releases.  We're not receiving patches contributed to
JIRA items for bugs.

We are proposing:

1/ moving it into the main build so it will be part of the main
distribution with limited testing.

2/ marking it as under review / maintenance only.

It will not be treated as something that can block a release, nor for any
significant length of time, stop development builds.

It may be pulled from the main build, and from a release, at very short
notice.

If moved out, the source code will still be available but no binaries
(releases or development builds) will be produced.

What would change SDB's status is care and attention. There are ways to
enhance it, for example, pushing the work of filters into the SQL database,
where possible, to improve query performance.

 Andy

[1] 
http://jena.apache.org/**documentation/sdb/index.htmlhttp://jena.apache.org/documentation/sdb/index.html




--
Dipl. Inf. Claus Stadler
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org/
Workpage  WebID: http://aksw.org/ClausStadler
Phone: +49 341 97-32260



Request for a QueryExecutionFactory interface

2013-06-07 Thread Claus Stadler

Hi,

Would it be possible to add a QueryExecutionFactory (QEF) *interface* to 
Jena?
The com.hp.hpl.jena.query.QueryExecutionFacotry has lots of static 
factory methods, but I guess it would be very useful if Jena itself 
provided such an interface (either different package, different name or 
both) because
then implementations based on Jena could rely on such interface (see 
below and [1]) in a (quasi) standard way, and other projects could 
provide fancy implementations.


public interface QueryExecutionFactory
extends QueryExecutionFactoryString, QueryExecutionFactoryQuery
{ /** * Some Id identifying the SPARQL service, such as a name given to 
a jena Model or the URL of a remote service */
String getId(); /** * Some string identifying the state of this 
execution factory, such as the selected graphs, or for query federation 
the configured endpoints and their respective graphs. * Used for caching */

String getState();
}


The reason I ask this, is because I created [2], which uses this 
architecture to transparently add delay, caching and pagination to a QEF 
- i.e. you could just pose a usual SPARQL query to DBpedia, and [2] will 
take care of retrieving the *complete* result, thereby caching each page 
so that one can resume pagination from cache should something go wrong.


But for example, someone might provide a parallel pagination component, 
or some query federation system, such as FedX could be wrapped with such 
interface as well, and application developers would not have to rely on 
a specific implementation.


Cheers,
Claus

[1] 
https://github.com/AKSW/jena-sparql-api/blob/master/jena-sparql-api-core/src/main/java/org/aksw/jena_sparql_api/core/QueryExecutionFactory.java

[2] https://github.com/AKSW/jena-sparql-api

--
Dipl. Inf. Claus Stadler
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org/
Workpage  WebID: http://aksw.org/ClausStadler
Phone: +49 341 97-32260



Re: Request for a QueryExecutionFactory interface

2013-06-07 Thread Andy Seaborne

Hi Claus,

Fancy implementations can be provided - the mechanism is to provide 
QueryEngineFactory (which is an interface).  The system takes a query 
and a daatset and asks each registered QueryEngineFactory if it will 
handle the query.  The first one to say yes gets to execute the query. 
 The general purpose query engine is last on the last on that list.


https://svn.apache.org/repos/asf/jena/trunk/jena-arq/src-examples/arq/examples/engine/MyQueryEngine.java

This how TDB and SDB extend ARQ.  They look for datasets of a given type.

(They both also implement Graph and are able to execute queries on 
single graphs so you can have a mixture of graph storage types in one 
dataset.  That isn't required.)


Proving this through a fixed QueryExecutionFactory means that once init 
code runs, there isn't an issue of finding the QueryExecutionFactory.


But if you want to provide something that replaces even QueryExecution 
then maybe we can find a way within the current style of looking for 
factories.  This woudl then not disrupt people's code.


QueryExecutionFactory.make(Query, Dataset, Context) would look for 
QueryExecution creators using one that did the current 
QueryEngineFactory+QueryExecutionBase thing.


If you need more than QueryEngineFactory, please raise a JIRA to discuss 
this ... details matter :-) when avoiding disturbing existing code.


Andy

PS You may be interested in Stephen's
https://svn.apache.org/repos/asf/jena/Experimental/jena-client/

Not the same but maybe of interest.

On 07/06/13 12:52, Claus Stadler wrote:

Hi,

Would it be possible to add a QueryExecutionFactory (QEF) *interface* to
Jena?
The com.hp.hpl.jena.query.QueryExecutionFacotry has lots of static
factory methods, but I guess it would be very useful if Jena itself
provided such an interface (either different package, different name or
both) because
then implementations based on Jena could rely on such interface (see
below and [1]) in a (quasi) standard way, and other projects could
provide fancy implementations.

public interface QueryExecutionFactory
extends QueryExecutionFactoryString, QueryExecutionFactoryQuery
{ /** * Some Id identifying the SPARQL service, such as a name given to
a jena Model or the URL of a remote service */
String getId(); /** * Some string identifying the state of this
execution factory, such as the selected graphs, or for query federation
the configured endpoints and their respective graphs. * Used for caching */
String getState();
}


The reason I ask this, is because I created [2], which uses this
architecture to transparently add delay, caching and pagination to a QEF
- i.e. you could just pose a usual SPARQL query to DBpedia, and [2] will
take care of retrieving the *complete* result, thereby caching each page
so that one can resume pagination from cache should something go wrong.

But for example, someone might provide a parallel pagination component,
or some query federation system, such as FedX could be wrapped with such
interface as well, and application developers would not have to rely on
a specific implementation.

Cheers,
Claus

[1]
https://github.com/AKSW/jena-sparql-api/blob/master/jena-sparql-api-core/src/main/java/org/aksw/jena_sparql_api/core/QueryExecutionFactory.java

[2] https://github.com/AKSW/jena-sparql-api