[SMW-devel] New RDF store binding

Markus Krötzsch Fri, 22 Apr 2011 04:37:53 -0700

I am happy to announce that SMW now natively supports data 
synchronization with any SPARQL-capable RDF store. This email explains 
how this works, what the status is, and how interested users can test it 
with their own wikis and triples stores.


It would be helpful if interested users could test the synchronization 
with their favourite stores (Virtuoso anybody?). Details are in the 
testing section below.

Cheers,

Markus



= Detailed Information =


== Status ==

The feature is part of the current SMW SVN trunk. This code is still 
unstable due to major internal changes that the new functions required. 
It is not recommended to use this code yet on sites that are used 
productively. In particular, SMW extensions may be break and sporadic 
errors may occur. The type Record is currently disabled. I will send 
another email later for extension developers.


== Upgrade and Downgrade ==

You can install the SVN version as usual. Running the setup 
(Special:SMWAdmin or SMW_setup.php script) is recommended. this deletes 
some data. Switching back to SMW 1.5.* rthen would require running setup 
again (to avoid errors), and doing a full data refresh (to make all 
stored data available again).

You need Curl to be enabled in PHP for the SPARQL connection. This might 
require an additional package for you PHP distribution (in Linux). There 
are no other dependencies (in particular, we do not use ARC or any other 
PHP library for SPARQL support).


== Setting up a SPARQL store ==

SMW can synchronize its data with any SPARQL-capable store. This should 
work reliably for all editing operations, including page deletions and 
updates. To enable this in SMW, you must (obviously) have a running RDF 
store. Then add the following lines to your LocalSettings.php (after 
loading SemanticMediaWiki.php):

$smwgDefaultStore = 'SMWSparqlStore';
$smwgSparqlDatabase = 'SMWSparqlDatabase';
$smwgSparqlQueryEndpoint = 'http://localhost:8080/sparql/';
$smwgSparqlUpdateEndpoint = 'http://localhost:8080/update/';
$smwgSparqlDataEndpoint = 'http://localhost:8080/data/';

The last three lines should be the URLs under which your store provides 
its various services. The query endpoint is for read queries, the update 
endpoint is for SPARQL Update, the optional data endpoint is for the 
SPARQL HTTP Graph Management Protocol. You can set the data endpoint to 
'' if you do not want to use this service (anything that it does can 
also be done via SPARQL Update, but the simpler HTTP Protocol might be 
faster). Also, the code for the HTTP Protocol might be specific to 
4Store, so it could be necessary to set it to '' if you use another store.

This should suffice to make all your page edits to be mirrored in the 
default graph of your RDF store. The current implementation does not 
distinguish graphs yet, so you can only add to the default graph. You 
can try storing a page to see if it works. The URIs used in the store 
are the same as in the RDF you get via Special:ExportRDF. If this works, 
you can start a data update (Special:SMWAdmin) or run the update script 
(SMW_refreshData.php) to add all data to the store.


== Testing and Contributing ==

We want to support as many RDF stores as possible, and contributions are 
welcome. At the current stage, errors should be apparent in the above 
refreshing/updating stage (ideally done on the command line to see the 
output for all pages).

Our current error handling is not extensively tested. Some errors are 
ignored silently (especially connection errors that appear to be 
temporary) while others are reported. If you find that the updates go 
not as expected, then please insert the following code into your 
LocalSettings.php:

$wgExtensionFunctions[] = 'runSparqlTests';
function runSparqlTests() {
        $sdb = smwfGetSparqlDatabase();
        print " Ping (query): " . ( $sdb->ping( 
SMWSparqlDatabase::EP_TYPE_QUERY ) ? 'yes' : 'no' ) . "\n";
        print " Ping (update): " . ( $sdb->ping( 
SMWSparqlDatabase::EP_TYPE_UPDATE ) ? 'yes' : 'no' ) . "\n";
        print " Ping (data): " . ( $sdb->ping( SMWSparqlDatabase::EP_TYPE_DATA 
) ? 'yes' : 'no' ) . "\n";
}

This tries to ping the services that you specified and dumps the result 
to the output (obviously, this should only be done on test sites).

Developers are welcome to have a look at the code in 
./includes/sparql/SMW_SparqlDatabase.php. This file realizes all basic 
communication via SPARQL, independent from the concrete RDF/SPARQL that 
is used in SMW. It can be used to issue arbitrary SPARQL queries. 
Compare it to MediaWiki's Database class. Special adjustments to 
individual stores (e.g. using some proprietary feature for COUNT 
queries) will be realized by subclassing SMWSparqlDatabase. Such 
subclasses could also use PHP APIs to communicate with a store. So 
developers who wish to contribute to the SMW triple store support would 
mostly work with this code. Likewise, extension developers who want to 
use SPARQL would typically talk to this code.

The "sparql" directory also contains a SPARQL XML result parser and a 
container for representing them in PHP. Code that uses SPARQL never will 
be bothered with XML or other result formats. The internal presentation 
of URIs and literals is done with the objects in 
./export/SMW_ExpElement.php. These objects will also provide a mapping 
back to wiki data, soon, to enable further processing in higher levels.


== What is next? ==

I will proceed to fully enable SPARQL-based #ask to replace the current 
SQL implementation. This will happen within the next few days. The RDF 
data stored by SMW has changed slightly and will further change, 
typically by adding more information that was currently missing. For 
example, redirects and sortkeys are now stored in RDF because they are 
needed for #ask.

In the medium run, one could further move most/all storage activities 
from SQL to SPARQL. Currently, the SMWSparqlStore is a subclass of the 
SMWSQLStore2, so it keeps the whole SQL database underneath as before. 
With SPARQL becoming stable, this could be changed (or a mixed approach 
could be chosen that combines the strengths of SQL and SPARQL). 
Developers who are interested in this are also welcome to help.



------------------------------------------------------------------------------
Fulfilling the Lean Software Promise
Lean software platforms are now widely adopted and the benefits have been 
demonstrated beyond question. Learn why your peers are replacing JEE 
containers with lightweight application servers - and what you can gain 
from the move. http://p.sf.net/sfu/vmware-sfemails
_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

[SMW-devel] New RDF store binding

Reply via email to