Re: Newbie LOD Questions :)

Kingsley Idehen Wed, 28 Oct 2009 17:46:08 -0700

Nathan wrote:

Kingsley Idehen wrote:
Nathan wrote:
Hi All,
Apologies if this is the wrong place to ask questions about linkeddata; however not sure where else to turn at the minute! and againas it's quite a long list.
worth noting the following link for most of the following questions:
http://sameas.org/text?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FLondon
1] Let's say I'm writing an article about London, England; which oneof the many URI's do I reference that my data is "about"?
2] Would there be scope for a single globally unique identifier /URI to represent "London, England"? one which rather than holdinginformation about London (like http://dbpedia.org/resource/London),essentially held a set of sameas items which everyone could use whenpublishing data "about" "London, England" (like the data at thesameas.org link above).
3] If sameAs indicates that two URI references contain informationabout the same thing; how do we assert that two URI's contain thesame information about the same thing (ie identical data)?
You don't want to assert that they have the same data. You areasserting co-reference i.e. the URIs are about the same Entity. Thus,you can then perform union style expansion from the co-reference URIsto get a bigger picture of a given entity e.g., London, from avariety of data sources.
Examples:

- About Me (compact) [1]
- About Me (expanded via explicit co-reference of the kind deliveredby owl:sameAs) [2]- About Me (expanded via fuzzier co-reference via a rule thatasserts, in this context, that foaf:name is an inverse functionalproperty i.e., its values are in-direct identifiers) [3]
3a] as [3], mirrors are common on the net, us1.domain us2.domainetc; each one containing the same information; as above how wouldone indicate that the data is the same? considering that ...
- the data is identical, no way to inject in a "sameas" in to the rdf
RDFizer middleware can use custom (context specific) rules to maketheir own assertions as part of the RDFization processing pipeline.For instance, SPARQL is an effective rules language (head and bodyjust happen to be on a vertical as opposed to horizontal visualplane), so it is possible for said engines to perform constrainedforward-chaining (with the generated triples written to a specificgraph that used in specific context).
- one 3rd party may reference the urihttp://us1.domain.com/something.rdf whilst another 3rd partyreferences http://us2.domain.com/something.rdfboth are the same data, but no correlation between the two existsanywhere to say they are the same thing.
See comment above.
- it stands to reason that the ideal is a single endpoint andmirrors behind the scenes without any http 30* redirects ever beingreturned to the client, however this won't always be the case sowhat syntax can we use in this scenario?
Single cannot exist.

Context is all that exists.
Within a given context (always inherently subjective) certainassertions can be made about co-reference be it explicit (owl:sameAs)or fuzzy (e.g. IFP based rules).
[4] Are there any conventions or guidelines for combining data andresolving discrepancies? for instance to get all data about londonone would theoretically have to combine all the data from the uri'sreferenced at (the sameas.org link aforementioned), but surely ifyou combined all data together then you'd get both duplicates anddifferences in the data.. which is fact etc.
[4a] Likewise with people - I have multiple social profiles allabout "me" but surely in the near future multiple URI's will eachrepresent #me; I think we can safely say that not all of these willbe linked with sameas, and further still which one should X personuse when referencing information about "me"?
[4b] Is there any method to mark which is the preferred source ofinformation (and verify it)? at the minute it seems like it would bevery simple to publish a vast amount of inaccurate data in triplesand it appears the current mentality would be to take it for grantedthat the information IS fact.
DC vs ctag and FOAF

For RDFa we have ctag and maker; which to me seems very exact:



but in dublin core we have the very loose
Washington
Example
I'm aware one can couple both dc and ctag/foaf in RDFa; but shouldwe be replacing dc values wherever possible with the more precisectag/foaf? (and indeed in our standard rdf data?)
A quick question about the usage of RDFa; previously I had alwaysenvisioned RDFa documents to contain a lot of inline rdf markup; I'maware of the problems in picking up a term in the middle of a blockof text and wrapping it in the appropriate notation; however myquestion is am I wrong in thinking this is the main use/advantage?in most cases where I've sen XHTML+RDFa (like uriburner etc) it'sbeen more case of using RDFa to display human readable RDF; asopposed to human targeted article with rfda embedded in-place /in-line. Does anybody have any examples of a full RDFa demo site;not just with the normal dc/foaf and tags but fully enriched withdetected semantic terms highlighted, linked and wrapped in rdfa,inline..?
And finally any info on creating a set/document which comprises ofor includes / references items in other datasets? (I may really showmy newbie-ness here) - what I mean is say I'm making an RDFa pageabout London, and in that I mention the population; I don't want tohave the population in document or in the rdf, I do however want tolink through to the triple which holds the population for london indbpedia or a geo set and have that in my rdfa. So where I could have:
(s–p–o)
london-population-7556900

I'd rather have:
london-population-{some link to dbpedia-owl:populationTotal value indbpedia's rdf for london)
Thus I'm saying that london's population is {found here} and it'd benice if it can also be pulled in and displayed through in anXHTML+RDFa document by possiblycontent="URI#dbpedia-owl:populationTotal" or suchlike.Not sure if I explained that properly, perhaps just simply how do Ireference a single triple rather than a full rdf set; or am I way oftarget?
Many Thanks in advance for any answers, comments etc & apologiesagain if it's the wrong place to ask!
Nathan
I've used My person entity URI instead of "London", for maximumeffect i.e., lots of URIs associated with me etc..
Links:
1. http://tr.im/DoCA -- compact description (&sas=no implies no"owl:sameAs" context rule)2. http://tr.im/DoD4 - owl:sameAs expansion (&sas=yes implies"owl:sameAs" smushing/meshing/expansion/explosion context on)3. http://tr.im/DoIi -- show UI that provides holistic view of thedata space i.e., you can see via the indirect co-reference the effectof an IFP rule re. foaf:name and foaf:mbox_sha1sum (note: there is abug I hit while writing this mail and you will most likely hit it ifyou click on the IFP tab URIs)4. http://tr.im/DoNv -- above using London from the larger datacorpus (8 Billion) at: http://lod.openlinksw.com (just visit the tabsfor the different co-reference URIs).
Thanks Kingsley!
something just clicked and half of my questions are now irrelevant; tosummarise my current understanding..
Let's say I'm doing the simplest report ever, where I want to displaythe sentence "The population of X is Y" in XHTML+RDFa.- where X is the current name of "London" and "Y" is the currentpopulation, even if the name changes in 50 years to "new london" andthe population drops to 54- and I want the data to be always up to date (or as up to date assources allow)
then all I need to do is:

1 - find a resource which holds rdf information about London
2 - SPARQL said resource to pick out only the name and population nodes

If using a SPARQL engine with inference rules capability plus theability to crawl within a SPARQL solution processing pipeline, you canenable "owl:sameAs" inference context, and also ask the engine to follow"owl:sameAs" links; thereby enabling you to get an expanded data set (onthe fly) to which the query pattern is applied, en route to final solution.

3 - (optional) assign a nice endpoint display the results of the queryas rdf

You don't display anything as "RDF" per se. you have datarepresentations based on the RDF model, and from these you can makedifferent presentations e.g. HTML or HTML+RDF (if you want to make yourpresentation document a structured data data source based on the RDF model).

4 - XSLT transform the results in to an XHTML+RDFa document ["about"URI for london], [by FOAF:Person/dc:creator me] where name andpopulation are injected in to X & Y respectively.

Sure, you can even use xslt in the SPARQL Protocol URL to achieve this goal.

that makes sense, and I'll assume the following:
to ensure info is always available I'd need to query (not sure what orwhere here) to get all resources which describe city london, then useall resources from above query as my source for steps 1-4 above; thusshould dbpedia die my report will still work!(?)

See comment above, re owl:sameAs and crawling withing SPARQL processingpipeline. In addition to that you can cache you query results in you owndata space, and even apply cache invalidation schemes to that particularspace (remember HTTP gives you the mechanics for this gratis).

I guess that also advocates the decentralized nature of LOD and usesof sameas, but on the other hand it suggests a need for a single pointof entry / initial search in to the "cloud"?

No single point of entry per se. I would think more in terms ofdiscovery and the degree of serendipity that the Web offers as itsstructured linked data aspect gets denser.

and to ensure I've "got it"..
essentially I could write the following sentence in a document"'Kingsley Idehen' wrote a post entitled 'name of post'"; where the'name of post' is automatically injected at render time directly fromthe rdf title of your post; so if you change the title, my sentenceremains accurate.

Yes :-)

All that's left is:
- my question regarding "combining data and resolving discrepancies"(unless I find the answer upon closer analysis of the provided links);

Just look at the tabs in the links from my previous posts. The "indirectco-reference" by nature of its essence, should reveal obvious errors,since foaf:name alone cannot be a serious basis for co-reference (eventhe fuzzier indirect variety).

- which are the preferred ontologies to use when trying to be veryspecific about a subject (rather than dc.subject dc.creator etc whichare essentially free text based not URI identifier based)

Depends on what your describing. From my vantage point (or world view)the foundation ontologies are :


1. FOAF
2. Dublin Core
3. SIOC
4. GoodRelations
5. Bibliographic Ontology
6. Music Ontology
7. SKOS


Regards & many thanks,

Nathan



--


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen

President & CEOOpenLink Software Web: http://www.openlinksw.com

Re: Newbie LOD Questions :)

Reply via email to