Re: ApacheCon EU CFP is open

Sebastian Schaffert Fri, 03 Aug 2012 01:09:21 -0700

Hi Tommaso,

I have worked a bit with Clerezza in the context of Stanbol. While I can see 
the advantages of being triple-store agnostic and having a large OSGi 
infrastructure already in place, currently we do not intend to base the 
implementation on Clerezza. There are a number of reasons for this:
- we implement a reasoning and a versioning functionality that require 
low-level access to the triple store to be efficient; this would mean that we 
rather implement a triple store implementation beneath Sesame or Jena and use 
these APIs than the other way round; since both are considered 
"killer-features" by many users, I would not like to remove this functionality
- the Linked Data Server should be capable of handling hundreds of millions or 
even billions of triples (our current biggest productive scenario has around 
140 million triples) and efficiently run queries (SPARQL) over this dataset; an 
intermediate layer in the way implemented by Clerezza is not suitable for this 
(running efficient SPARQL queries requires translating them into queries on the 
storage layer - e.g. SQL - and cannot be done on top of the API level)
- Clerezza essentially provides YET ANOTHER API for managing triples; 
consequence: developers need to learn its specifics, the new API needs to be 
fire-proven again even though the other two (Sesame and Jena) are already 
established, the existing standards (SPARQL query+update, various RDF 
serialization formats, …) need to be implemented yet-again


At the moment, our implementation builds mostly on the Sesame API because it is 
in our opinion by far the cleanest existing API for working with RDF (in fact, 
I consider it very well thought out and consistent, I have worked with both 
Sesame and Jena for many years). This is of course not a hard constraint, but 
for the time being I would keep it like this to avoid unnecessary additional 
effort. In the future, however, it might make sense to use Clerezza as a 
"middle layer" inbetween our low-level and our high-level functionalities 
(depending on how Clerezza itself develops).

If you want to, we can discuss this also at ApacheCon, perhaps I am missing 
something. ;-)

Greetings,

Sebastian


Am 02.08.2012 um 11:11 schrieb Tommaso Teofili:

> Hi Sebastian,
> 
> that looks very good, reading your idea of a proposal (which I like) I am
> wondering what do you think about using / extending Apache Clerezza as the
> basic infrastructure for that.
> Reasons for that would be: it's triple store agnostic (and extensible), can
> "offer" REST interfaces for the underlying data, Stanbol is based on it.
> Surely it should be improved for scalability purposes (i.e. building a
> cluster of Clerezza instances).
> 
> Thanks for sharing your nice proposal.
> Looking forward to meet you at the ApacheCon.
> Cheers,
> Tommaso
> 
> 2012/8/2 Sebastian Schaffert <[email protected]>
> 
>> Dear all,
>> 
>> Rupert and I would also like to propose a presentation about a "Apache
>> Linked Data Server", which could continue the work we have so far been
>> doing in the integration of our Linked Media Framework and Apache Stanbol.
>> The idea here would that we submit significant parts of the LMF as an
>> Apache incubator proposal that would closely integrate with Stanbol, which
>> could eventually result in the mentioned "Apache Linked Data Server".
>> 
>> Why would this be useful? More and more institutions participate in "open
>> data" initiatives, but open data currently mostly means simply publishing a
>> CSV or XLS file on a Web server. Semantic Web technology and Linked Data
>> would offer much better interoperability, but if you really plan to publish
>> data, the amount of work required is significant (e.g. setting up a
>> Virtuoso server) and does not integrate so well with other technologies
>> that are in use in institutions. An Open Source Apache project that could
>> be deployed easily would make it much easier, especially for public
>> institutions or small enterprises, to offer their data publically.
>> 
>> I have some screencasts already on how this works with the LMF:
>> 
>> http://code.google.com/p/lmf/
>> 
>> And we would like to discuss in what form it would make sense to transform
>> this into an Apache project. The presentation at ApacheCon could be a first
>> presentation of this idea.
>> 
>> Greetings,
>> 
>> Sebastian
>> 
>> Am 02.08.2012 um 09:11 schrieb Fabian Christ:
>> 
>>> Hi,
>>> 
>>> I do not know how many presentation will be potentially accepted. I
>>> would assume that there is not that much room for more than one
>>> presentation from each podling but that is just a guess. We will see.
>>> 
>>> Okay then, I will submit a proposal for an overview talk about Stanbol
>>> which is lead by use cases and scenarios where Stanbol may be a
>>> helpful framework for people, as Tommaso suggested.
>>> 
>>> Best,
>>> - Fabian
>>> 
>>> 2012/7/31 Suat Gonul <[email protected]>:
>>>> Hi all,
>>>> 
>>>> I can submit a presentation about the Contenthub. It's mainly about
>>>> Contenthub, but it also includes Enhancer, Entityhub and CMS Adapter
>>>> components. The use case may be similar to our previously ehealth
>>>> demonstration [1]. In terms of its content, it may well suit after
>>>> Rupert's presentation. So, the use case may be as follows:
>>>> 
>>>>   * A content administrator would configure a few healthcare datasets
>>>>     as Referenced/Managed Sites based on the indexing facilities of
>>>>     Entityhub.
>>>>   * He would configure KeywordLinkingEngine's associated with those
>>>>     Sites to be able to do health domain specific enhancements.
>>>>   * After analyzing the details of the external datasets, he defines
>>>>     an LDPath which is compatible with the external datasets. This
>>>>     LDPath is used as the configuration of a Solr based SemanticIndex.
>>>>   * In the next step, the admin configures a CMS Adapter based Store
>>>>     associated with his workspace in the CMS.
>>>>   * When he creates/updates document on the CMS, the Store keeps track
>>>>     of the changes on the CMS documents and enhance them automatically.
>>>>   * The LDPath based SemanticIndex becomes aware of the changes in the
>>>>     Store and indexes the documents according to its LDPath
>>>>     configuration. In this process, it also gathers additional
>>>>     information from the external datasets for the named entities
>>>>     recognized in the documents from the ManagedSites associated with
>>>>     the external datasets.
>>>>   * As a result, the admin would have a semantically enhanced Solr
>>>>     index considering in terms health domain and he can use the index
>>>>     directly through its RESTful API.
>>>> 
>>>> I hope the use case is clear. What do you think?
>>>> 
>>>> Best,
>>>> Suat
>>>> 
>>>> [1] http://www.youtube.com/watch?v=l7n6aRFcn1U
>>>> 
>>>> 
>>>> On 07/31/2012 11:40 AM, Rupert Westenthaler wrote:
>>>>> Hi,
>>>>> 
>>>>> I think I will go than for a presentation about how to use the Stanbol
>>>>> Enhancer to link Linked Data Entities.
>>>>> 
>>>>> * Intro
>>>>> * NER vs. KeywordExtraction (typical Chain configurations, used
>>>>> Enhancement Engines
>>>>> * Indexing Datasets for the Entityhub
>>>>> * Managing Entities via the RESTful Interface (Entityhub Managed Sites)
>>>>> 
>>>>> This assumes that there is also a more general overview about Stanbol
>>>>> and the Stanbol Enhancer - hoping @Fabian for that.
>>>>> 
>>>>> WDYT
>>>>> best
>>>>> Rupert Westenthaler
>>>>> 
>>>>> On Thu, Jul 26, 2012 at 3:47 PM, Tommaso Teofili
>>>>> <[email protected]> wrote:
>>>>>> In my opinion starting from a real use case which demonstrates a
>> subset of
>>>>>> the whole set of features is a good way of catching audience's
>> attention.
>>>>>> My 2 cents,
>>>>>> Tommaso
>>>>>> 
>>>>>> 2012/7/26 Rupert Westenthaler <[email protected]>
>>>>>> 
>>>>>>> On Thu, Jul 26, 2012 at 11:04 AM, Fabian Christ
>>>>>>> <[email protected]> wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I just want to point out that the CFP deadline is 3rd August, 2012.
>>>>>>>> 
>>>>>>> Yes I am interested ...
>>>>>>> 
>>>>>>> Should we aim for a Stanbol overview talk or rather concentrate on -
>>>>>>> maybe several - specific features/components
>>>>>>> 
>>>>>>> best
>>>>>>> Rupert
>>>>>>> 
>>>>>>>> Is there any interest from other committers submit a talk?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> - Fabian
>>>>>>>> 
>>>>>>>> 2012/7/17 Fabian Christ <[email protected]>:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> the call for papers and talks for the upcoming ApacheCon EU is open
>>>>>>>>> 
>>>>>>>>> http://www.apachecon.eu/cfp/
>>>>>>>>> 
>>>>>>>>> I would be interested in submitting a paper/talk about Stanbol.
>>>>>>>>> Perhaps we should start to collect ideas here. Anyone else here
>>>>>>>>> interested in writing something?
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> - Fabian
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Fabian
>>>>>>>>> http://twitter.com/fctwitt
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Fabian
>>>>>>>> http://twitter.com/fctwitt
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> | Rupert Westenthaler             [email protected]
>>>>>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>>>>>> | A-5500 Bischofshofen
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Fabian
>>> http://twitter.com/fctwitt
>> 
>> Sebastian
>> --
>> | Dr. Sebastian Schaffert          [email protected]
>> | Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
>> | Head of Knowledge and Media Technologies Group          +43 662 2288 423
>> | Jakob-Haringer Strasse 5/II
>> | A-5020 Salzburg
>> 
>> 

Sebastian
-- 
| Dr. Sebastian Schaffert          [email protected]
| Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
| Head of Knowledge and Media Technologies Group          +43 662 2288 423
| Jakob-Haringer Strasse 5/II
| A-5020 Salzburg

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: ApacheCon EU CFP is open

Reply via email to