Re: Proposal for Integration of Linked Media Framework in Apache Stanbol

Sebastian Schaffert Tue, 26 Jul 2011 08:54:46 -0700

Dear Florent,

Am 26.07.2011 um 16:46 schrieb florent andré:
> 
>> 
>> The dependency to Hibernate is mostly for the triple store, not for CMS 
>> capabilities. And this is something I don't see how to avoid in the near 
>> future because we need to store additional information about triples for 
>> reasoning and versioning.
>> 
>> Versioning is also of triples, not of content. As such it is probably also 
>> interesting to the Stanbol community.
> 
> I'm interesting in a little explanation of the way you store version / 
> history of triples.


We use a purely relational approach actually:
- a table "KIWINODE" stores RDF nodes (unified table for literals, blank nodes 
and resources)
- a table "TRIPLES" stores triples with id, subject, predicate, object, 
context, marker for deleted, marker for inferred, timestamp, creator (subject, 
predicate, object, context, creator are references to KIWINODE)
- a table "VERSION" stores version ID, timestamp, creator
- join tables "VERSION_ADDEDNODES", "VERSION_REMOVEDNODES", 
"VERSION_ADDEDTRIPLES", "VERSION_REMOVEDTRIPLES" store references to added and 
removed nodes and to added and removed triples; for deleted triples and nodes, 
the boolean marker will be set to true, for added nodes it will be false

Versioning is thus a simple database operation. "Active" (undeleted) triples 
can be easily filtered using the boolean marker. Undoing simply means reversing 
the operations (add and remove) on triples and nodes.


> 
> I begin to think about that (but just think for now :) ), and the possible 
> help of big tables (e.g. hbase) for this...
> 
> Hbase is a (kind of) 3 dimensional database :
> - 1 is column
> - 1 is row
> - 1 is timestamp

I really don't see the point. A relational database is already n-dimensional ;-)


> 
> So, for my 100 feet idea :
> - each triple is a row
> - ?s, ?p, ?o each a column (or a column family)
> 
> And so, history of each triple is store on the 3rd dimension : timestamps.
> 
> This can bring to a really clean and easy design... if not strong 
> technical/integration restrictions comes...

I am not really convinced, but maybe you can offer some more details and 
convince me.;-) I am not familiar with these kinds of databases.

My thought is that relational databases are really well suited for the task 
because this is what they have been designed for (triples are really purely 
relational data), with one (minor) exception: expensive join operations happen 
frequently when querying RDF, and there is almost no chance to materialize them 
in advance. This can be compensated a bit by proper indexing and configuration 
of the database, however.

Greetings,

Sebastian
-- 
| Dr. Sebastian Schaffert          [email protected]
| Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
| Head of Knowledge and Media Technologies Group          +43 662 2288 423
| Jakob-Haringer Strasse 5/II
| A-5020 Salzburg

Re: Proposal for Integration of Linked Media Framework in Apache Stanbol

Reply via email to