Sorry, ignore priore email, it was sent prematurely.
We had occasion to need the ability to eplore a triple store in an
application we were building for a client using a triple store (TS).
Triples were being created using scripts and being updated into the
TS,we also had an application that allowed users to enter information
which added more triples. All of this was backed by an ontology that
was evolving. It was pretty tricking knowing what parts of the ontology
were being exercised and which were not. So we wrote some SPARQL
queries that produced a table where each row said something like this:
There are 543 triples where the subject is of type Person and the
predicate is employedBy and the object is of type Organization.
The table looked a bit like this:
Subject Predicate Object Count
Person hasEmployer Organization 2344
Organization locatedIn GeoRegion 432
We found this to be extremely useful, not only to see exactly what was
being used, but also how much as well as what was NOT being used, which
were candidates for removing from the ontology. The SPARQL queries are
not simple to write, but they are not too bad either. Some of the other
responses spoke of similar things.
This is more specialized than the original question, which was to find
out what the ontology was. Here were were more concerned about which
parts of the ontology were being used.
Michael
On Wed, Feb 4, 2015 at 12:42 PM, Michael F Uschold <[email protected]
<mailto:[email protected]>> wrote:
We had occasion to need this ability on an application we were
building for a client using a triple store (TS). Triples were being
created using scripts and being updated into the TS,we also had an
application that allowed users to enter information which added more
triples. All of this was backed by an ontology that was evolving.
It was pretty tricking knowing what parts of the ontology were being
exercised and which were not. So we wrote some SPARQL queries that
produced a table where each row said something like this:
There are 543 triples where the subject is of type Person and the
predicate is employedBy and the object is of type Organization.
A row looked like this:
Subject
On Wed, Feb 4, 2015 at 11:35 AM, Lushan Han <[email protected]
<mailto:[email protected]>> wrote:
This work [1] might be helpful to some people. It automatically
learns a "schema" from a given RDF dataset, including most
probable classes and properties and most probable
relations/paths between given classes and etc. Next, it can
automatically translate a casual user's intuitive graph query or
schema-free query to a formal SPARQL query using the learned
schema and statistical NLP techniques, like textual semantic
similarity.
[1]
http://ebiquity.umbc.edu/paper/html/id/658/Schema-Free-Querying-of-Semantic-Data
Cheers,
Lushan
On Sun, Jan 25, 2015 at 11:32 PM, Pavel Klinov
<[email protected] <mailto:[email protected]>> wrote:
On Sun, Jan 25, 2015 at 11:44 PM, Bernard Vatant
<[email protected]
<mailto:[email protected]>> wrote:
> Hi Pavel
>
> Very interesting discussion, thanks for the follow-up.. Some
quick answers
> below, but I'm currently writing a blog post which will go in
more details
> on the notion of Data Patterns, a term I've been pushing last
week on the DC
> Architecture list, where it seems to have gained some traction.
>
Seehttps://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1501&L=dc-architecture
> for the discussion.
OK, thanks for the link, will check it out. I agree that the
patterns
is perhaps a better term than "schema" since by the latter
people
typically mean explicit specification. I guess it's my use
of the term
"schema" which created some confusion initially.
>> ... which reflects what the
>> data is all about. Knowing such structure is useful (and often
>> necessary) to be able to write meaningful queries and that's, I
think,
>> what the initial question was.
>
>
> Certainly, and I would rewrite this question : How do you find
out data
> patterns in a dataset?
I think it's a more general and tough question having to do
with data
mining. Not sure that anyone would venture into finding out data
patterns against a public endpoint just to be able to write
queries
for it.
>
>>
>> When such structure exists, I'd say
>> that the dataset has an *implicit* schema (or a conceptual
model, if
>> you will).
>
>
> Well, that's where I don't follow. If data, as it happens more
and more, is
> gathered from heterogeneous sources, the very notion of a
conceptual model
> is jumping to conclusions.
A merger of structures is still a structure. By anyways,
I've already
agreed to say patterns =)
> In natural languages, patterns often precede the
> grammar describing them, even if the patterns described in the
grammar at
> some point become prescriptive rules. Data should be looked at
the same way.
Not sure. I won't immediately disagree since I don't have
statistics
regarding structured/unstructured datasets out there.
>>
>> What is absent is an explicit representation of the schema,
>> or the conceptual model, in terms of RDFS, OWL, or SKOS axioms.
>
>
> When the dataset gathers various sources and various
vocabularies, such a
> schema does not exists, actually.
Not necessarily. Parts of it may exist. Take yago, for
example. It's
derived from a bunch of sources including Wikipedia and
GeoNames and
yet offers its schema for a separate download.
>> However, when the schema *is* represented explicitly, knowing it
is a
>> huge help to users which otherwise know little about the data.
>
>
> OK, but the question is : which is a good format for exposing this
> structure?
> RDFS/OWL ontology/vocabulary, Application Profiles, RDF Shapes /
whatever it
> will be named, or ... ?
I think this question is a bit secondary. If the need were
recognized,
this could be, at least in theory, agreed on.
>>
>> PPS. It'd also be correct to claim that even when a structure
exists,
>> realistic data can be messy and not fit into it entirely. We've
seen
>> stuff like literals in the range of object properties, etc. It's
a
>> separate issue having to do with validation, for which there's an
>> ongoing effort at W3C. However, it doesn't generally hinder
writing
>> queries which is what we're discussing here.
>
>
> Well I don't see it as a separate issue. All the raging debate
around RDF
> Shapes is not (yet) about validation, but on the definition of
what a
> shape/structure/schema can be.
OK, won't disagree on this.
Thanks,
Pavel
>
>
>>
>> > Since the very notion of schema for RDF data has no
meaning at all,
>> > and the absence of schema is a bit frightening, people
tend to give it a
>> > lot
>> > of possible meanings, depending on your closed world
or open world
>> > assumption, otherwise said if the "schema" will be
used for some kind of
>> > inference or validation. The use of "Schema" in RDFS
has done nothing to
>> > clarify this, and the use of "Ontology" in OWL added a
layer of
>> > confusion. I
>> > tend to say "vocabulary" to name the set of types and
predicates used by
>> > a
>> > dataset (like in Linked Open Vocabularies), which is a
minimal
>> > commitment to
>> > how it is considered by the dataset owner, bearing in
mind that this
>> > "vocabulary" is generally a mix of imported terms from
SKOS, FOAF,
>> > Dublin
>> > Core ... and home-made ones. Which is completely OK
with the spirit of
>> > RDF.
>> >
>> > The brand new LDOM [1] or whatever it ends up to be
named at the end of
>> > the
>> > day might clarify the situation, or muddle those
waters a bit more :)
>> >
>> > [1] http://spinrdf.org/ldomprimer.html
>> >
>> > 2015-01-23 10:37 GMT+01:00 Pavel Klinov
<[email protected] <mailto:[email protected]>>:
>> >>
>> >> Alright, so this isn't an answer and I might be
saying something
>> >> totally silly (since I'm not a Linked Data person,
really).
>> >>
>> >> If I re-phrase this question as the following: "how
do I extract a
>> >> schema from a SPARQL endpoint?", then it seems to pop
up quite often
>> >> (see, e.g., [1]). I understand that the original
question is a bit
>> >> more general but it's fair to say that knowing the
schema is a huge
>> >> help for writing meaningful queries.
>> >>
>> >> As an outsider, I'm quite surprised that there's
still no commonly
>> >> accepted (i'm avoiding "standard" here) way of doing
this. People
>> >> either hope that something like VoID or LOV
vocabularies are being
>> >> used, or use 3-party tools, or write all sorts of ad
hoc SPARQL
>> >> queries themselves, looking for types, object properties,
>> >> domains/ranges etc-etc. There are also papers written
on this subject.
>> >>
>> >> At the same time, the database engines which host
datasets often (not
>> >> always) manage the schema separately from the data.
There're good
>> >> reasons for that. One reason, for example, is to be
able to support
>> >> basic reasoning over the data, or integrity
validation. Just because
>> >> in RDF the schema language and the data language are
the same, so
>> >> schema and data triples can be interleaved, it need
not (and often
>> >> not) be managed that way.
>> >>
>> >> Yet, there's no standard way of requesting the schema
from the
>> >> endpoint, and I don't quite understand why. There's
the SPARQL 1.1
>> >> Service Description, which could, in theory, cover
it, but it doesn't.
>> >> Servicing such schema extraction requests doesn't
have to be mandatory
>> >> so the endpoints which don't have their schemas right
there don't have
>> >> to sift through the data. Also, schemas are typically
quite small.
>> >>
>> >> I guess there's some problem with this which I'm
missing...
>> >>
>> >> Thanks,
>> >> Pavel
>> >>
>> >> [1]
>> >>
>> >>
http://answers.semanticweb.com/questions/25696/extract-ontology-schema-for-a-given-sparql-endpoint-data-set
>> >>
>> >> On Thu, Jan 22, 2015 at 3:09 PM, Juan Sequeda
<[email protected] <mailto:[email protected]>>
>> >> wrote:
>> >> > Assume you are given a URL for a SPARQL endpoint.
You have no idea
>> >> > what
>> >> > data
>> >> > is being exposed.
>> >> >
>> >> > What do you do to explore that endpoint? What
queries do you write?
>> >> >
>> >> > Juan Sequeda
>> >> > +1-575-SEQ-UEDA
>> >> > www.juansequeda.com <http://www.juansequeda.com>
>> >>
>> >
>> >
>> >
>> >
>
>
> --
> Bernard Vatant
> Vocabularies & Data Engineering
> Tel : + 33 (0)9 71 48 84 59
<tel:%2B%2033%20%280%299%2071%2048%2084%2059>
> Skype : bernard.vatant
> http://google.com/+BernardVatant
> --------------------------------------------------------
> Mondeca
> 35 boulevard de Strasbourg 75010 Paris
> www.mondeca.com <http://www.mondeca.com>
> Follow us on Twitter : @mondecanews
> ----------------------------------------------------------
--
Michael Uschold
Senior Ontology Consultant, Semantic Arts
http://www.semanticarts.com <http://www.semanticarts.com/>
LinkedIn:http://tr.im/limfu
Skype, Twitter: UscholdM
--
Michael Uschold
Senior Ontology Consultant, Semantic Arts
http://www.semanticarts.com <http://www.semanticarts.com/>
LinkedIn:http://tr.im/limfu
Skype, Twitter: UscholdM