Re: Linked data sets for evaluating interlinking?

Rob Warren Mon, 26 Aug 2013 12:18:28 -0700

I would add that in many cases, unlikeness (owl:differentFrom) is as valuable 
as equivalence.


All my best,
rhw
On 2013-08-26, at 4:08 PM, Hugh Glaser wrote:

> Hi Cristina,
> Some interesting issues you raise.
> One of them is how people publish links (which enables your analysis).
> There are two ways this happens.
> 1) People add triples to their dataset that have an equivalence predicate 
> (owl:sameAs, skos:exactMatch, skos:closeMatch, etc.)
> 2) People use a "foreign" URI (very commonly a dbpedia URI), because when 
> turning their data into RDF they have decided that the entity they are 
> concerned with is the same as the dbpedia one. The second paragraph of Tom's 
> message describes such a linkage, I think.
> I think these distinctions are behind the comments of Milorad, where he is 
> assuming the type (2) way.
> Either of these methods should be fodder for you, and you may well find that 
> the type (2) way is used by a dataset that is useful to you.
> It may be harder for you to process, as the linkage is not so explicit 
> because there is no distinct URI for the resource in the database, different 
> from the "foreign" one. But any "foreign" URI is in fact a link.
> You will find that people have tended towards type (2) linkage because they 
> can shy away from having lots of equivalence predicates in their datasets, 
> not least because there was a time when RDF stores did not comfortably do 
> owl:sameAs inference, and so they do the linking at RDF conversion time, and 
> use "foreign" URIs.
> 
> Another interesting issue is more fundamental to your work.
> You seem to think that there must be a "gold standard or reference 
> interlinking" for equivalence.
> As long-time readers of this list will have seen discussed many times (!), it 
> is not a simple matter.
> It is a complex matter to have such a thing, which is a necessity for you to 
> do your precision/recall statistics.
> At its most basic, for example, am I as a private citizen the same as me as a 
> member of my University or me as a member of my company?
> The answer is, of course yes and no.
> Another field that has spent a lot of time on this is the FRBR world 
> (http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records).
> If I have a book of the Semantic Web, is it the same as your book of the same 
> name?
> Perhaps. What if it is a different (corrected) edition? An electronic version?
> Certainly a library will usually consider each book a different thing, but if 
> you are asking how many books the author has published, you want to treat all 
> the books as the same resource.
> 
> So in asking for a "gold standard or reference interlinking", I think you are 
> chasing a chimera.
> What you can do is choose datasets and then you will need to find out what 
> the policies of the equivalence creators; and then you will need to build 
> your system so that it implements the same policies.
> By the way, policies usually relate to the way in which the dataset will be 
> used, rather than the wishes of the publisher of the data - there is no 
> absolute truth in this. Some would argue there is never any equivalence: "One 
> cannot step once (sic) into the same stream" 
> (http://en.wikipedia.org/wiki/Cratylus)
> 
> It's great you have asked the question - convincing research in this field is 
> very challenging!
> 
> Best
> Hugh
> 
> On 26 Aug 2013, at 14:16, Tom Elliott <[email protected]>
> wrote:
> 
>> Hi all:
>> 
>> Two humanities datasets of potential interest in this regard:
>> 
>> A number of datasets (around 20 different ones I think) related to the study 
>> of antiquity have aligned their geographic/toponymic fields with the 
>> Pleiades gazetteer (http://pleiades.stoa.org) and published RDF accordingly. 
>> Most of this work has been done under the auspices of something called the 
>> Pelagios Project, and the alignment processes used by many of the 
>> participants are documented in blog posts at 
>> http://pelagios-project.blogspot.com/ (most of them a combination of 
>> automated and manual). Pleiades itself is also a linked data resource, and 
>> has a growing number (still only a small percentage of its content) of 
>> outbound links to dbpedia, geonames, and OSM. All of those outbound links 
>> are hand-curated. Contributors to Pleiades, where possible, are aligned to 
>> VIAF (manually) and bibliography in Pleiades is also beginning to be aligned 
>> to the Open Library and Worldcat (again, manually).
>> 
>> On a much smaller scale, I offer the "About Roman Emperors" dataset, which 
>> rather than minting its own URIs for the Roman emperors, uses the dbpedia 
>> resource URIs for each: http://www.paregorios.org/resources/roman-emperors/. 
>> The primary purpose of the dataset is to provide a comprehensive list of 
>> these for easy access and reuse by third parties, and to associate the 
>> dbpedia URIs with corresponding Roman imperial mint and minting authority 
>> data in nomisma.org and finds.org.uk, and to a static, late-90s-vintage 
>> scholarly encyclopedia of Roman emperors: http://www.roman-emperors.org/
>> 
>> Tom
>> 
>> 
>> Tom Elliott, Ph.D.
>> Associate Director for Digital Programs and Senior Research Scholar
>> Institute for the Study of the Ancient World (NYU)
>> http://isaw.nyu.edu/people/staff/tom-elliott
>> 
>> 
>> 
>> On Aug 26, 2013, at 6:04 AM, Adrian Stevenson wrote:
>> 
>>> Hi All
>>> 
>>> As part of the LOCAH and Linking Lives projects, the latter in particular, 
>>> we've being doing a lot of this auto and manual linking work, mainly to 
>>> VIAF and DBPedia, with some links to things like LCSH and Geonames. We've 
>>> been doing a lot of work just recently in fact, and we've published a blog 
>>> post that's picked up quite a bit of interest on this - 
>>> http://archiveshub.ac.uk/blog/2013/08/hub-viaf-namematching/. We haven't 
>>> published our latest run of data yet, but we hope to finish this soon. 
>>> It'll probably still be about a month or so as a few of us are on holiday 
>>> soon.
>>> 
>>> We do have quite a few links done semi-automatically in our existing data 
>>> set accessible via http://data.archiveshub.ac.uk but as I say we are 
>>> updating this, I'd suggest not taking the URIs and data available there as 
>>> the final word.
>>> 
>>> A good example is 
>>> http://data.archiveshub.ac.uk/page/person/nra/webbmarthabeatrice1858-1943socialreformer
>>> 
>>> Project URIs:
>>> http://archiveshub.ac.uk/locah/
>>> http://archiveshub.ac.uk/linkinglives/
>>> 
>>> Adrian
>>> _____________________________
>>> Adrian Stevenson
>>> Senior Technical Innovations Coordinator
>>> Mimas, The University of Manchester
>>> Devonshire House, Oxford Road
>>> Manchester M13 9QH
>>> 
>>> Email: [email protected]
>>> Tel: +44 (0) 161 275 6065
>>> http://www.mimas.ac.uk
>>> http://www.twitter.com/adrianstevenson
>>> http://uk.linkedin.com/in/adrianstevenson/
>>> 
>>> On 22 Aug 2013, at 16:06, Cristina Sarasua wrote:
>>> 
>>>> Hi, 
>>>> 
>>>> I am looking for pairs of linked data sets that can be used as gold 
>>>> standard for evaluations.  I would need pairs of data sets which have been 
>>>> manually linked, or data sets which have been (semi-)automatically linked 
>>>> with interlinking tools, and afterwards reviewed (to include the links 
>>>> which are not identified by tools). I have looked into the DataHub 
>>>> catalogue and queried VoiD descriptions, but unfortunately the information 
>>>> about how the interlinking process was carried out is often missing.
>>>> 
>>>> Apart from the data sets which have been used in the OAEI-instance 
>>>> matching track, could anyone recommend (based on past experience) good 
>>>> data sets for evaluating data interlinking processes?
>>>> 
>>>> Thanks in advance.
>>>> 
>>>> Kind regards, 
>>>> 
>>>> Cristina
>>>> -- 
>>>> Cristina Sarasua
>>>> 
>>>> Institute for Web Science and Technologies (WeST)
>>>> 
>>>> Universität Koblenz-Landau
>>>> Universitätsstraße 1
>>>> 56070 Koblenz
>>>> Germany
>>>> 
>>>> e: 
>>>> [email protected]
>>>> 
>>>> p: +49 261 287 2772
>>>> f: +49 261 287 100 2772
>>>> w: 
>>>> http://west.uni-koblenz.de 
>>> 
>>> 
>> 
>> 
>> 
> 
>

Re: Linked data sets for evaluating interlinking?

Reply via email to