Re: Proposal for Integration of Linked Media Framework in Apache Stanbol

Sebastian Schaffert Tue, 26 Jul 2011 12:21:24 -0700

Sorry, this was meant for Rupert so I wrote in German ... but nothing secret ;-)


It says that I would also like to achieve this and that I would like to ger rid 
of Hibernate if there is a clean way ...


Btw, I am on vacation starting this evening... in case of questions my colleaue 
Thomas can surely answer competently in my place. I will check mail from time 
to time though ;-)


Greetings,

Sebastian


Am 26.07.2011 um 21:18 schrieb Sebastian Schaffert:

> Danke für die Unterstützung, da würd ich auch gern hin. ;-)
> 
> Aber viele Vorschläge sind schon sehr gut, ich würd wirklich gerne weg von 
> Hibernate wenn es eine saubere Möglichkeit gibt ...
> 
> lg
> Sebastian
> 
> Am 26.07.2011 um 18:59 schrieb Rupert Westenthaler:
> 
>> Hi
>> 
>> I think we should investigate if it would make sense to implement the
>> Clerezza APIs on top of the "Kiwi" Triple store. This would allow any
>> Clerezza based Application - including stanbol - to use this Triple
>> store implementation.
>> 
>> WDYT
>> Rupert
>> 
>> On Tue, Jul 26, 2011 at 5:54 PM, Sebastian Schaffert
>> <[email protected]> wrote:
>>> Dear Florent,
>>> 
>>> Am 26.07.2011 um 16:46 schrieb florent andré:
>>>> 
>>>>> 
>>>>> The dependency to Hibernate is mostly for the triple store, not for CMS 
>>>>> capabilities. And this is something I don't see how to avoid in the near 
>>>>> future because we need to store additional information about triples for 
>>>>> reasoning and versioning.
>>>>> 
>>>>> Versioning is also of triples, not of content. As such it is probably 
>>>>> also interesting to the Stanbol community.
>>>> 
>>>> I'm interesting in a little explanation of the way you store version / 
>>>> history of triples.
>>> 
>>> We use a purely relational approach actually:
>>> - a table "KIWINODE" stores RDF nodes (unified table for literals, blank 
>>> nodes and resources)
>>> - a table "TRIPLES" stores triples with id, subject, predicate, object, 
>>> context, marker for deleted, marker for inferred, timestamp, creator 
>>> (subject, predicate, object, context, creator are references to KIWINODE)
>>> - a table "VERSION" stores version ID, timestamp, creator
>>> - join tables "VERSION_ADDEDNODES", "VERSION_REMOVEDNODES", 
>>> "VERSION_ADDEDTRIPLES", "VERSION_REMOVEDTRIPLES" store references to added 
>>> and removed nodes and to added and removed triples; for deleted triples and 
>>> nodes, the boolean marker will be set to true, for added nodes it will be 
>>> false
>>> 
>>> Versioning is thus a simple database operation. "Active" (undeleted) 
>>> triples can be easily filtered using the boolean marker. Undoing simply 
>>> means reversing the operations (add and remove) on triples and nodes.
>>> 
>>> 
>>>> 
>>>> I begin to think about that (but just think for now :) ), and the possible 
>>>> help of big tables (e.g. hbase) for this...
>>>> 
>>>> Hbase is a (kind of) 3 dimensional database :
>>>> - 1 is column
>>>> - 1 is row
>>>> - 1 is timestamp
>>> 
>> I think there is currently a lot of work on how to handle Graph
>> Structures in this kind of data stores. I am definitely interested in
>> this topic but currently I do not have the time to investigate it in
>> more detail.
>> 
>>> I really don't see the point. A relational database is already 
>>> n-dimensional ;-)
>>> 
>> 
>> As long as you can handle the amount of triples on a single machine it
>> is fore sure more efficient and easier to implement to handle it with
>> a relational database.
>> I think there is also a new TripleStore implementation around that
>> uses Solr/Lucene to store Triples. Someone has mentioned it in Paris,
>> but I have forgot the name of the project.
>> 
>>> 
>>>> 
>>>> So, for my 100 feet idea :
>>>> - each triple is a row
>>>> - ?s, ?p, ?o each a column (or a column family)
>>>> 
>>>> And so, history of each triple is store on the 3rd dimension : timestamps.
>>>> 
>>>> This can bring to a really clean and easy design... if not strong 
>>>> technical/integration restrictions comes...
>>> 
>>> I am not really convinced, but maybe you can offer some more details and 
>>> convince me.;-) I am not familiar with these kinds of databases.
>>> 
>>> My thought is that relational databases are really well suited for the task 
>>> because this is what they have been designed for (triples are really purely 
>>> relational data), with one (minor) exception: expensive join operations 
>>> happen frequently when querying RDF, and there is almost no chance to 
>>> materialize them in advance. This can be compensated a bit by proper 
>>> indexing and configuration of the database, however.
>>> 
>> 
>> Yago2 uses a special n-triple model that includes subject, predicate,
>> object, temporal, spatial and full text. For spatial and full text
>> they use the according extensions of the relational databases. By that
>> they can creatly reduce the amount of joins for requests for event
>> like data.
>> 
>> Again this discussion is very related to the work of Fabian on the Factstore!
>> 
>> best
>> Rupert
>> 
>> -- 
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
> 
> Sebastian
> -- 
> | Dr. Sebastian Schaffert          [email protected]
> | Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
> | Head of Knowledge and Media Technologies Group          +43 662 2288 423
> | Jakob-Haringer Strasse 5/II
> | A-5020 Salzburg
> 

Sebastian
-- 
| Dr. Sebastian Schaffert          [email protected]
| Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
| Head of Knowledge and Media Technologies Group          +43 662 2288 423
| Jakob-Haringer Strasse 5/II
| A-5020 Salzburg

Re: Proposal for Integration of Linked Media Framework in Apache Stanbol

Reply via email to