Future of Jena SDB
SDB is a Jena storage module that uses SQL databases for RDF storage. See [1] for documentation. It uses a custom database schema to store RDF; it is not a general SQL-to-RDF mapping layer. The supported databases are: Oracle, Microsoft SQL Server, DB2, PostgreSQL, MySQL, Apache Derby, H2, HSQLDB. Only Derby and HSQLDB are tested in the development build process. Both Oracle and IBM corporations provide commercial RDF solutions using Jena that are completely unrelated to SDB. TDB is faster, more scalable and better supported than SDB but there can be reasons why an SQL-backed solution is appropriate. There is no active development or maintenance of SDB from within the committer team; no committers use SDB and it imposes a cost to the team to generate separate releases. We're not receiving patches contributed to JIRA items for bugs. We are proposing: 1/ moving it into the main build so it will be part of the main distribution with limited testing. 2/ marking it as under review / maintenance only. It will not be treated as something that can block a release, nor for any significant length of time, stop development builds. It may be pulled from the main build, and from a release, at very short notice. If moved out, the source code will still be available but no binaries (releases or development builds) will be produced. What would change SDB's status is care and attention. There are ways to enhance it, for example, pushing the work of filters into the SQL database, where possible, to improve query performance. Andy [1] http://jena.apache.org/documentation/sdb/index.html
Re: Future of Jena SDB
Could SDB be useful when dealing with GeoSPARQL and your backend is (something like) Postgresql+Postgis? (just a question, this is not one of my needs at the moment). On Fri, Jun 7, 2013 at 11:29 AM, Andy Seaborne a...@apache.org wrote: SDB is a Jena storage module that uses SQL databases for RDF storage. See [1] for documentation. It uses a custom database schema to store RDF; it is not a general SQL-to-RDF mapping layer. The supported databases are: Oracle, Microsoft SQL Server, DB2, PostgreSQL, MySQL, Apache Derby, H2, HSQLDB. Only Derby and HSQLDB are tested in the development build process. Both Oracle and IBM corporations provide commercial RDF solutions using Jena that are completely unrelated to SDB. TDB is faster, more scalable and better supported than SDB but there can be reasons why an SQL-backed solution is appropriate. There is no active development or maintenance of SDB from within the committer team; no committers use SDB and it imposes a cost to the team to generate separate releases. We're not receiving patches contributed to JIRA items for bugs. We are proposing: 1/ moving it into the main build so it will be part of the main distribution with limited testing. 2/ marking it as under review / maintenance only. It will not be treated as something that can block a release, nor for any significant length of time, stop development builds. It may be pulled from the main build, and from a release, at very short notice. If moved out, the source code will still be available but no binaries (releases or development builds) will be produced. What would change SDB's status is care and attention. There are ways to enhance it, for example, pushing the work of filters into the SQL database, where possible, to improve query performance. Andy [1] http://jena.apache.org/**documentation/sdb/index.htmlhttp://jena.apache.org/documentation/sdb/index.html
Re: Future of Jena SDB
Hi, Just a quick note, as I am the developer of Sparqlify[1], which pretty much is a general SQL-to-RDF mapping layer (well, or at least SPARQL-SQL rewriter) based on Jena (it also has some dependencies to SDB). From my experience, although I had and still have some dependencies to SDB, but I had to e.g. duplicate the SqlExpr hierachy (e.g. [2]) because I needed each SqlExpr to provide its datatype). The old master branch of Sparqlify already had initial support for rewriting spatial predicates to SQL (only ST_Intersects and ST_DWithin), but we are currently enhancing this system to allow one to essentially expose any (or at least most) SQL functions as a SPARQL one. [1] https://github.com/AKSW/Sparqlify [2] https://github.com/AKSW/Sparqlify/tree/master/sparqlify-core/src/main/java/org/aksw/sparqlify/algebra/sql/exprs2 Cheers, Claus On 06/07/2013 11:54 AM, Olivier Rossel wrote: Could SDB be useful when dealing with GeoSPARQL and your backend is (something like) Postgresql+Postgis? (just a question, this is not one of my needs at the moment). On Fri, Jun 7, 2013 at 11:29 AM, Andy Seaborne a...@apache.org wrote: SDB is a Jena storage module that uses SQL databases for RDF storage. See [1] for documentation. It uses a custom database schema to store RDF; it is not a general SQL-to-RDF mapping layer. The supported databases are: Oracle, Microsoft SQL Server, DB2, PostgreSQL, MySQL, Apache Derby, H2, HSQLDB. Only Derby and HSQLDB are tested in the development build process. Both Oracle and IBM corporations provide commercial RDF solutions using Jena that are completely unrelated to SDB. TDB is faster, more scalable and better supported than SDB but there can be reasons why an SQL-backed solution is appropriate. There is no active development or maintenance of SDB from within the committer team; no committers use SDB and it imposes a cost to the team to generate separate releases. We're not receiving patches contributed to JIRA items for bugs. We are proposing: 1/ moving it into the main build so it will be part of the main distribution with limited testing. 2/ marking it as under review / maintenance only. It will not be treated as something that can block a release, nor for any significant length of time, stop development builds. It may be pulled from the main build, and from a release, at very short notice. If moved out, the source code will still be available but no binaries (releases or development builds) will be produced. What would change SDB's status is care and attention. There are ways to enhance it, for example, pushing the work of filters into the SQL database, where possible, to improve query performance. Andy [1] http://jena.apache.org/**documentation/sdb/index.htmlhttp://jena.apache.org/documentation/sdb/index.html -- Dipl. Inf. Claus Stadler Department of Computer Science, University of Leipzig Research Group: http://aksw.org/ Workpage WebID: http://aksw.org/ClausStadler Phone: +49 341 97-32260
Request for a QueryExecutionFactory interface
Hi, Would it be possible to add a QueryExecutionFactory (QEF) *interface* to Jena? The com.hp.hpl.jena.query.QueryExecutionFacotry has lots of static factory methods, but I guess it would be very useful if Jena itself provided such an interface (either different package, different name or both) because then implementations based on Jena could rely on such interface (see below and [1]) in a (quasi) standard way, and other projects could provide fancy implementations. public interface QueryExecutionFactory extends QueryExecutionFactoryString, QueryExecutionFactoryQuery { /** * Some Id identifying the SPARQL service, such as a name given to a jena Model or the URL of a remote service */ String getId(); /** * Some string identifying the state of this execution factory, such as the selected graphs, or for query federation the configured endpoints and their respective graphs. * Used for caching */ String getState(); } The reason I ask this, is because I created [2], which uses this architecture to transparently add delay, caching and pagination to a QEF - i.e. you could just pose a usual SPARQL query to DBpedia, and [2] will take care of retrieving the *complete* result, thereby caching each page so that one can resume pagination from cache should something go wrong. But for example, someone might provide a parallel pagination component, or some query federation system, such as FedX could be wrapped with such interface as well, and application developers would not have to rely on a specific implementation. Cheers, Claus [1] https://github.com/AKSW/jena-sparql-api/blob/master/jena-sparql-api-core/src/main/java/org/aksw/jena_sparql_api/core/QueryExecutionFactory.java [2] https://github.com/AKSW/jena-sparql-api -- Dipl. Inf. Claus Stadler Department of Computer Science, University of Leipzig Research Group: http://aksw.org/ Workpage WebID: http://aksw.org/ClausStadler Phone: +49 341 97-32260
Re: Request for a QueryExecutionFactory interface
Hi Claus, Fancy implementations can be provided - the mechanism is to provide QueryEngineFactory (which is an interface). The system takes a query and a daatset and asks each registered QueryEngineFactory if it will handle the query. The first one to say yes gets to execute the query. The general purpose query engine is last on the last on that list. https://svn.apache.org/repos/asf/jena/trunk/jena-arq/src-examples/arq/examples/engine/MyQueryEngine.java This how TDB and SDB extend ARQ. They look for datasets of a given type. (They both also implement Graph and are able to execute queries on single graphs so you can have a mixture of graph storage types in one dataset. That isn't required.) Proving this through a fixed QueryExecutionFactory means that once init code runs, there isn't an issue of finding the QueryExecutionFactory. But if you want to provide something that replaces even QueryExecution then maybe we can find a way within the current style of looking for factories. This woudl then not disrupt people's code. QueryExecutionFactory.make(Query, Dataset, Context) would look for QueryExecution creators using one that did the current QueryEngineFactory+QueryExecutionBase thing. If you need more than QueryEngineFactory, please raise a JIRA to discuss this ... details matter :-) when avoiding disturbing existing code. Andy PS You may be interested in Stephen's https://svn.apache.org/repos/asf/jena/Experimental/jena-client/ Not the same but maybe of interest. On 07/06/13 12:52, Claus Stadler wrote: Hi, Would it be possible to add a QueryExecutionFactory (QEF) *interface* to Jena? The com.hp.hpl.jena.query.QueryExecutionFacotry has lots of static factory methods, but I guess it would be very useful if Jena itself provided such an interface (either different package, different name or both) because then implementations based on Jena could rely on such interface (see below and [1]) in a (quasi) standard way, and other projects could provide fancy implementations. public interface QueryExecutionFactory extends QueryExecutionFactoryString, QueryExecutionFactoryQuery { /** * Some Id identifying the SPARQL service, such as a name given to a jena Model or the URL of a remote service */ String getId(); /** * Some string identifying the state of this execution factory, such as the selected graphs, or for query federation the configured endpoints and their respective graphs. * Used for caching */ String getState(); } The reason I ask this, is because I created [2], which uses this architecture to transparently add delay, caching and pagination to a QEF - i.e. you could just pose a usual SPARQL query to DBpedia, and [2] will take care of retrieving the *complete* result, thereby caching each page so that one can resume pagination from cache should something go wrong. But for example, someone might provide a parallel pagination component, or some query federation system, such as FedX could be wrapped with such interface as well, and application developers would not have to rely on a specific implementation. Cheers, Claus [1] https://github.com/AKSW/jena-sparql-api/blob/master/jena-sparql-api-core/src/main/java/org/aksw/jena_sparql_api/core/QueryExecutionFactory.java [2] https://github.com/AKSW/jena-sparql-api