Broken Links in LOD Data Sets

Bernhard Haslhofer Thu, 05 Feb 2009 07:37:40 -0800


Hi all,

we are currently working on the question how to deal with broken links/references between resources in (distinct) LOD data sets and wouldlike to know your opinion on that issue. If there is some work goingon into this direction, please let me know.

I think I do not really need to explain the problem. Everybody knowsit from the "human" Web when you follow a link and you get an annoying404 response.

If we assume that the consumers of LOD data are not humans butapplications, broken links/references are not only "annoying" butcould lead to severe processing errors if an application relies on akind of "referential integrity".

Assume we have an LOD data source X exposing resources that describeimages and these images are linked with resources in DBPedia (e.g., http://dbpedia.org/resource/Berlin). An application built on-top of X follows links to retrieve the geo-coordinates in order to display the images on a virtual map. If now,for some reason, the URL of the linked DB-Pedia resource changeseither because DBPedia is moved or re-organized, which I guess couldhappen to any LOD source in a long-term perspective, the applicationmight crash if doesn't consider that referenced resources might moveor become unavailable.

I know that "cool URIs don't change" but I am not sure if thisassumption holds in practice, especially in a long-term perspective.


For the "human" Web several solutions have been proposed, e.g.,
1.) PURL and DOI services for translating URNs into resolvable URLs
2.) forward references

3.) robust link implementations, i.e., with each link you keep a setof related search terms to retrieve moved / changed resources

4.) observer / notification mechanisms
X.) ?

I guess (1) is not really applicable for LOD resources because ofscalability and single-point of failure issues. (2) would require thatLOD providers take care of setting up HTTP redirects for their movedresources - no idea if anybody will do that in reality and how thiscan scale. (3) could help to re-locate moved resources via searchengines like Sindice but not really fully automatically. (4) could atleast inform a data source that certain references are broken and itcould remove them.

Another alternative is of course to completely leave the problem tothe application developers, which means that they must consider that areferenced resource might exist or not. I am not sure about thepractical consequences of that approach, especially if several datasources are involved, but I have the feeling that it is getting reallycomplicated if one cannot rely on any kind of referential integrity.

Are there any existing mechanism that can give us at least some basicfeedback about the "quality" of an LOD data source? I think, thereferential integrity could be such a quality property...


Thanks for your input on that issue,

Bernhard

______________________________________________________
Research Group Multimedia Information Systems
Department of Distributed and Multimedia Systems
Faculty of Computer Science
University of Vienna

Postal Address: Liebiggasse 4/3-4, 1010 Vienna, Austria
Phone: +43 1 42 77 39635 Fax: +43 1 4277 39649
E-Mail: [email protected]
WWW: http://www.cs.univie.ac.at/bernhard.haslhofer

Broken Links in LOD Data Sets

Reply via email to