Re: ApacheCon EU CFP is open

Reto Bachmann-Gmür Fri, 03 Aug 2012 03:54:53 -0700

I agree that clerezza should finally have tha fastlane for sparql query,
the current approach makes only sense if a query is against graphs from
multiple backends. This is definitively a bottle neck now.


What puzzles me is that you seem to think that the sesame api is cleaner
than the clerezza one. The clerezza api was introduced as non of the api
available would model the rdf abstract syntax without tying additional
concepts and utility classes into the core. If you find anything that's not
clean I'd like to address this.

It seems that what you are using of sesame is mainly the spi/api and not
the actual triple store. This is definitively something clerezza should
have a good offer for.

When I last looked at it sesame was not ready to be used in apache
projects, not sure if license issues are the cause of it not being
available in maven central.

Cheers,
Reto
On Aug 3, 2012 10:09 AM, "Sebastian Schaffert" <
[email protected]> wrote:

> Hi Tommaso,
>
> I have worked a bit with Clerezza in the context of Stanbol. While I can
> see the advantages of being triple-store agnostic and having a large OSGi
> infrastructure already in place, currently we do not intend to base the
> implementation on Clerezza. There are a number of reasons for this:
> - we implement a reasoning and a versioning functionality that require
> low-level access to the triple store to be efficient; this would mean that
> we rather implement a triple store implementation beneath Sesame or Jena
> and use these APIs than the other way round; since both are considered
> "killer-features" by many users, I would not like to remove this
> functionality
> - the Linked Data Server should be capable of handling hundreds of
> millions or even billions of triples (our current biggest productive
> scenario has around 140 million triples) and efficiently run queries
> (SPARQL) over this dataset; an intermediate layer in the way implemented by
> Clerezza is not suitable for this (running efficient SPARQL queries
> requires translating them into queries on the storage layer - e.g. SQL -
> and cannot be done on top of the API level)
> - Clerezza essentially provides YET ANOTHER API for managing triples;
> consequence: developers need to learn its specifics, the new API needs to
> be fire-proven again even though the other two (Sesame and Jena) are
> already established, the existing standards (SPARQL query+update, various
> RDF serialization formats, …) need to be implemented yet-again
>
> At the moment, our implementation builds mostly on the Sesame API because
> it is in our opinion by far the cleanest existing API for working with RDF
> (in fact, I consider it very well thought out and consistent, I have worked
> with both Sesame and Jena for many years). This is of course not a hard
> constraint, but for the time being I would keep it like this to avoid
> unnecessary additional effort. In the future, however, it might make sense
> to use Clerezza as a "middle layer" inbetween our low-level and our
> high-level functionalities (depending on how Clerezza itself develops).
>
> If you want to, we can discuss this also at ApacheCon, perhaps I am
> missing something. ;-)
>
> Greetings,
>
> Sebastian
>
>
> Am 02.08.2012 um 11:11 schrieb Tommaso Teofili:
>
> > Hi Sebastian,
> >
> > that looks very good, reading your idea of a proposal (which I like) I am
> > wondering what do you think about using / extending Apache Clerezza as
> the
> > basic infrastructure for that.
> > Reasons for that would be: it's triple store agnostic (and extensible),
> can
> > "offer" REST interfaces for the underlying data, Stanbol is based on it.
> > Surely it should be improved for scalability purposes (i.e. building a
> > cluster of Clerezza instances).
> >
> > Thanks for sharing your nice proposal.
> > Looking forward to meet you at the ApacheCon.
> > Cheers,
> > Tommaso
> >
> > 2012/8/2 Sebastian Schaffert <[email protected]>
> >
> >> Dear all,
> >>
> >> Rupert and I would also like to propose a presentation about a "Apache
> >> Linked Data Server", which could continue the work we have so far been
> >> doing in the integration of our Linked Media Framework and Apache
> Stanbol.
> >> The idea here would that we submit significant parts of the LMF as an
> >> Apache incubator proposal that would closely integrate with Stanbol,
> which
> >> could eventually result in the mentioned "Apache Linked Data Server".
> >>
> >> Why would this be useful? More and more institutions participate in
> "open
> >> data" initiatives, but open data currently mostly means simply
> publishing a
> >> CSV or XLS file on a Web server. Semantic Web technology and Linked Data
> >> would offer much better interoperability, but if you really plan to
> publish
> >> data, the amount of work required is significant (e.g. setting up a
> >> Virtuoso server) and does not integrate so well with other technologies
> >> that are in use in institutions. An Open Source Apache project that
> could
> >> be deployed easily would make it much easier, especially for public
> >> institutions or small enterprises, to offer their data publically.
> >>
> >> I have some screencasts already on how this works with the LMF:
> >>
> >> http://code.google.com/p/lmf/
> >>
> >> And we would like to discuss in what form it would make sense to
> transform
> >> this into an Apache project. The presentation at ApacheCon could be a
> first
> >> presentation of this idea.
> >>
> >> Greetings,
> >>
> >> Sebastian
> >>
> >> Am 02.08.2012 um 09:11 schrieb Fabian Christ:
> >>
> >>> Hi,
> >>>
> >>> I do not know how many presentation will be potentially accepted. I
> >>> would assume that there is not that much room for more than one
> >>> presentation from each podling but that is just a guess. We will see.
> >>>
> >>> Okay then, I will submit a proposal for an overview talk about Stanbol
> >>> which is lead by use cases and scenarios where Stanbol may be a
> >>> helpful framework for people, as Tommaso suggested.
> >>>
> >>> Best,
> >>> - Fabian
> >>>
> >>> 2012/7/31 Suat Gonul <[email protected]>:
> >>>> Hi all,
> >>>>
> >>>> I can submit a presentation about the Contenthub. It's mainly about
> >>>> Contenthub, but it also includes Enhancer, Entityhub and CMS Adapter
> >>>> components. The use case may be similar to our previously ehealth
> >>>> demonstration [1]. In terms of its content, it may well suit after
> >>>> Rupert's presentation. So, the use case may be as follows:
> >>>>
> >>>>   * A content administrator would configure a few healthcare datasets
> >>>>     as Referenced/Managed Sites based on the indexing facilities of
> >>>>     Entityhub.
> >>>>   * He would configure KeywordLinkingEngine's associated with those
> >>>>     Sites to be able to do health domain specific enhancements.
> >>>>   * After analyzing the details of the external datasets, he defines
> >>>>     an LDPath which is compatible with the external datasets. This
> >>>>     LDPath is used as the configuration of a Solr based SemanticIndex.
> >>>>   * In the next step, the admin configures a CMS Adapter based Store
> >>>>     associated with his workspace in the CMS.
> >>>>   * When he creates/updates document on the CMS, the Store keeps track
> >>>>     of the changes on the CMS documents and enhance them
> automatically.
> >>>>   * The LDPath based SemanticIndex becomes aware of the changes in the
> >>>>     Store and indexes the documents according to its LDPath
> >>>>     configuration. In this process, it also gathers additional
> >>>>     information from the external datasets for the named entities
> >>>>     recognized in the documents from the ManagedSites associated with
> >>>>     the external datasets.
> >>>>   * As a result, the admin would have a semantically enhanced Solr
> >>>>     index considering in terms health domain and he can use the index
> >>>>     directly through its RESTful API.
> >>>>
> >>>> I hope the use case is clear. What do you think?
> >>>>
> >>>> Best,
> >>>> Suat
> >>>>
> >>>> [1] http://www.youtube.com/watch?v=l7n6aRFcn1U
> >>>>
> >>>>
> >>>> On 07/31/2012 11:40 AM, Rupert Westenthaler wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I think I will go than for a presentation about how to use the
> Stanbol
> >>>>> Enhancer to link Linked Data Entities.
> >>>>>
> >>>>> * Intro
> >>>>> * NER vs. KeywordExtraction (typical Chain configurations, used
> >>>>> Enhancement Engines
> >>>>> * Indexing Datasets for the Entityhub
> >>>>> * Managing Entities via the RESTful Interface (Entityhub Managed
> Sites)
> >>>>>
> >>>>> This assumes that there is also a more general overview about Stanbol
> >>>>> and the Stanbol Enhancer - hoping @Fabian for that.
> >>>>>
> >>>>> WDYT
> >>>>> best
> >>>>> Rupert Westenthaler
> >>>>>
> >>>>> On Thu, Jul 26, 2012 at 3:47 PM, Tommaso Teofili
> >>>>> <[email protected]> wrote:
> >>>>>> In my opinion starting from a real use case which demonstrates a
> >> subset of
> >>>>>> the whole set of features is a good way of catching audience's
> >> attention.
> >>>>>> My 2 cents,
> >>>>>> Tommaso
> >>>>>>
> >>>>>> 2012/7/26 Rupert Westenthaler <[email protected]>
> >>>>>>
> >>>>>>> On Thu, Jul 26, 2012 at 11:04 AM, Fabian Christ
> >>>>>>> <[email protected]> wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I just want to point out that the CFP deadline is 3rd August,
> 2012.
> >>>>>>>>
> >>>>>>> Yes I am interested ...
> >>>>>>>
> >>>>>>> Should we aim for a Stanbol overview talk or rather concentrate on
> -
> >>>>>>> maybe several - specific features/components
> >>>>>>>
> >>>>>>> best
> >>>>>>> Rupert
> >>>>>>>
> >>>>>>>> Is there any interest from other committers submit a talk?
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> - Fabian
> >>>>>>>>
> >>>>>>>> 2012/7/17 Fabian Christ <[email protected]>:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> the call for papers and talks for the upcoming ApacheCon EU is
> open
> >>>>>>>>>
> >>>>>>>>> http://www.apachecon.eu/cfp/
> >>>>>>>>>
> >>>>>>>>> I would be interested in submitting a paper/talk about Stanbol.
> >>>>>>>>> Perhaps we should start to collect ideas here. Anyone else here
> >>>>>>>>> interested in writing something?
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> - Fabian
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Fabian
> >>>>>>>>> http://twitter.com/fctwitt
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Fabian
> >>>>>>>> http://twitter.com/fctwitt
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> | Rupert Westenthaler             [email protected]
> >>>>>>> | Bodenlehenstraße 11
> ++43-699-11108907
> >>>>>>> | A-5500 Bischofshofen
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Fabian
> >>> http://twitter.com/fctwitt
> >>
> >> Sebastian
> >> --
> >> | Dr. Sebastian Schaffert
> [email protected]
> >> | Salzburg Research Forschungsgesellschaft
> http://www.salzburgresearch.at
> >> | Head of Knowledge and Media Technologies Group          +43 662 2288
> 423
> >> | Jakob-Haringer Strasse 5/II
> >> | A-5020 Salzburg
> >>
> >>
>
> Sebastian
> --
> | Dr. Sebastian Schaffert          [email protected]
> | Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
> | Head of Knowledge and Media Technologies Group          +43 662 2288 423
> | Jakob-Haringer Strasse 5/II
> | A-5020 Salzburg
>
>

Re: ApacheCon EU CFP is open

Reply via email to