I would add that in many cases, unlikeness (owl:differentFrom) is as valuable as equivalence.
All my best, rhw On 2013-08-26, at 4:08 PM, Hugh Glaser wrote: > Hi Cristina, > Some interesting issues you raise. > One of them is how people publish links (which enables your analysis). > There are two ways this happens. > 1) People add triples to their dataset that have an equivalence predicate > (owl:sameAs, skos:exactMatch, skos:closeMatch, etc.) > 2) People use a "foreign" URI (very commonly a dbpedia URI), because when > turning their data into RDF they have decided that the entity they are > concerned with is the same as the dbpedia one. The second paragraph of Tom's > message describes such a linkage, I think. > I think these distinctions are behind the comments of Milorad, where he is > assuming the type (2) way. > Either of these methods should be fodder for you, and you may well find that > the type (2) way is used by a dataset that is useful to you. > It may be harder for you to process, as the linkage is not so explicit > because there is no distinct URI for the resource in the database, different > from the "foreign" one. But any "foreign" URI is in fact a link. > You will find that people have tended towards type (2) linkage because they > can shy away from having lots of equivalence predicates in their datasets, > not least because there was a time when RDF stores did not comfortably do > owl:sameAs inference, and so they do the linking at RDF conversion time, and > use "foreign" URIs. > > Another interesting issue is more fundamental to your work. > You seem to think that there must be a "gold standard or reference > interlinking" for equivalence. > As long-time readers of this list will have seen discussed many times (!), it > is not a simple matter. > It is a complex matter to have such a thing, which is a necessity for you to > do your precision/recall statistics. > At its most basic, for example, am I as a private citizen the same as me as a > member of my University or me as a member of my company? > The answer is, of course yes and no. > Another field that has spent a lot of time on this is the FRBR world > (http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records). > If I have a book of the Semantic Web, is it the same as your book of the same > name? > Perhaps. What if it is a different (corrected) edition? An electronic version? > Certainly a library will usually consider each book a different thing, but if > you are asking how many books the author has published, you want to treat all > the books as the same resource. > > So in asking for a "gold standard or reference interlinking", I think you are > chasing a chimera. > What you can do is choose datasets and then you will need to find out what > the policies of the equivalence creators; and then you will need to build > your system so that it implements the same policies. > By the way, policies usually relate to the way in which the dataset will be > used, rather than the wishes of the publisher of the data - there is no > absolute truth in this. Some would argue there is never any equivalence: "One > cannot step once (sic) into the same stream" > (http://en.wikipedia.org/wiki/Cratylus) > > It's great you have asked the question - convincing research in this field is > very challenging! > > Best > Hugh > > On 26 Aug 2013, at 14:16, Tom Elliott <[email protected]> > wrote: > >> Hi all: >> >> Two humanities datasets of potential interest in this regard: >> >> A number of datasets (around 20 different ones I think) related to the study >> of antiquity have aligned their geographic/toponymic fields with the >> Pleiades gazetteer (http://pleiades.stoa.org) and published RDF accordingly. >> Most of this work has been done under the auspices of something called the >> Pelagios Project, and the alignment processes used by many of the >> participants are documented in blog posts at >> http://pelagios-project.blogspot.com/ (most of them a combination of >> automated and manual). Pleiades itself is also a linked data resource, and >> has a growing number (still only a small percentage of its content) of >> outbound links to dbpedia, geonames, and OSM. All of those outbound links >> are hand-curated. Contributors to Pleiades, where possible, are aligned to >> VIAF (manually) and bibliography in Pleiades is also beginning to be aligned >> to the Open Library and Worldcat (again, manually). >> >> On a much smaller scale, I offer the "About Roman Emperors" dataset, which >> rather than minting its own URIs for the Roman emperors, uses the dbpedia >> resource URIs for each: http://www.paregorios.org/resources/roman-emperors/. >> The primary purpose of the dataset is to provide a comprehensive list of >> these for easy access and reuse by third parties, and to associate the >> dbpedia URIs with corresponding Roman imperial mint and minting authority >> data in nomisma.org and finds.org.uk, and to a static, late-90s-vintage >> scholarly encyclopedia of Roman emperors: http://www.roman-emperors.org/ >> >> Tom >> >> >> Tom Elliott, Ph.D. >> Associate Director for Digital Programs and Senior Research Scholar >> Institute for the Study of the Ancient World (NYU) >> http://isaw.nyu.edu/people/staff/tom-elliott >> >> >> >> On Aug 26, 2013, at 6:04 AM, Adrian Stevenson wrote: >> >>> Hi All >>> >>> As part of the LOCAH and Linking Lives projects, the latter in particular, >>> we've being doing a lot of this auto and manual linking work, mainly to >>> VIAF and DBPedia, with some links to things like LCSH and Geonames. We've >>> been doing a lot of work just recently in fact, and we've published a blog >>> post that's picked up quite a bit of interest on this - >>> http://archiveshub.ac.uk/blog/2013/08/hub-viaf-namematching/. We haven't >>> published our latest run of data yet, but we hope to finish this soon. >>> It'll probably still be about a month or so as a few of us are on holiday >>> soon. >>> >>> We do have quite a few links done semi-automatically in our existing data >>> set accessible via http://data.archiveshub.ac.uk but as I say we are >>> updating this, I'd suggest not taking the URIs and data available there as >>> the final word. >>> >>> A good example is >>> http://data.archiveshub.ac.uk/page/person/nra/webbmarthabeatrice1858-1943socialreformer >>> >>> Project URIs: >>> http://archiveshub.ac.uk/locah/ >>> http://archiveshub.ac.uk/linkinglives/ >>> >>> Adrian >>> _____________________________ >>> Adrian Stevenson >>> Senior Technical Innovations Coordinator >>> Mimas, The University of Manchester >>> Devonshire House, Oxford Road >>> Manchester M13 9QH >>> >>> Email: [email protected] >>> Tel: +44 (0) 161 275 6065 >>> http://www.mimas.ac.uk >>> http://www.twitter.com/adrianstevenson >>> http://uk.linkedin.com/in/adrianstevenson/ >>> >>> On 22 Aug 2013, at 16:06, Cristina Sarasua wrote: >>> >>>> Hi, >>>> >>>> I am looking for pairs of linked data sets that can be used as gold >>>> standard for evaluations. I would need pairs of data sets which have been >>>> manually linked, or data sets which have been (semi-)automatically linked >>>> with interlinking tools, and afterwards reviewed (to include the links >>>> which are not identified by tools). I have looked into the DataHub >>>> catalogue and queried VoiD descriptions, but unfortunately the information >>>> about how the interlinking process was carried out is often missing. >>>> >>>> Apart from the data sets which have been used in the OAEI-instance >>>> matching track, could anyone recommend (based on past experience) good >>>> data sets for evaluating data interlinking processes? >>>> >>>> Thanks in advance. >>>> >>>> Kind regards, >>>> >>>> Cristina >>>> -- >>>> Cristina Sarasua >>>> >>>> Institute for Web Science and Technologies (WeST) >>>> >>>> Universität Koblenz-Landau >>>> Universitätsstraße 1 >>>> 56070 Koblenz >>>> Germany >>>> >>>> e: >>>> [email protected] >>>> >>>> p: +49 261 287 2772 >>>> f: +49 261 287 100 2772 >>>> w: >>>> http://west.uni-koblenz.de >>> >>> >> >> >> > >
