Hi Pascal, On 30 May 2012 15:41, Pascal Christoph <[email protected]> wrote: > Hi Ben, > > >> That is great! Linking to Open Library makes Open Library more visible >> in the Linked Data world, I guess. >> >> I read your blog post, and would like to raise a couple of questions with >> you. >> First of all: where are the links? I see no link to the OL website (or >> a Work URI) on the page that is said to be an example... > > our SPARQL Endpoint has had a corrupted DB-file. Now it's reindexed - have > another try.
It works! :) > > In the dump I found that this edition has no extra entry, so it can only be > found included in the work-uri in the json dump (as seen above) (sort of > confusing, since I am used to RDF which uses graphs to store data, and the > edition DOES resolve ( http://openlibrary.org/books/OL12622527M ), so why not > just link it in the work-description in the dump? (as more as I am thinking > about it , I have to admit I only _begin_ to understand the dump ;) ) I think you are using a different type of dump than I am. And reading on in your email, yours appears to be one of type "deworks". The dumps I have used for my statistics, are the "normal" ones found in http://openlibrary.org/developers/dumps and they have no author names or complete edition records. Anyway, I knew I had seen something about "deworks", and this was it: https://github.com/internetarchive/openlibrary/issues/101 (denormalized works). > >> And regarding your remarkable example: that German version of Lord of >> the Rings should not be linked to a German work, but to The One Work >> called "The Lord of the Rings" (I consider the separate publications >> of the three parts one work each). The German work is a duplicate. > > yep, guessed it. How to deduplicate? Should not be too hard because you have a > work-link to library thing: > $ grep 'librarything": \["1386651"' ol_dump_deworks_2012-03-31.txt | wc -l > 49 If we trust the ID to be correct, that might work. > > seems to bring up 49 editions for one work level, but just a first control > sample shows no association with the work level at all: > > $ grep OL9177075M ol_dump_deworks_2012-03-31.txt > /books/OL9177075M {"editions": [{"publishers": ["RUSCONI"], > "physical_format": > "Paperback", "last_modified": {"type": "/type/datetime", "value": > "2011-04-29T03:29:19.321447"}, "created": {"type": "/type/datetime", "value": > "2008-04-30T09:38:13.731961"}, "number_of_pages": 1359, "isbn_13": > ["9788818123210"], "languages": [{"key": "/languages/ita"}], "isbn_10": > ["8818123211"], "publish_date": "1985", "key": "/books/OL9177075M", "title": > "IL SIGNORE DEGLI ANELLI (Titolo originale dell'opera: The Lord of the > Rings)", > "oclc_numbers": ["635814336"], "revision": 4, "type": {"key": > "/type/edition"}, > "latest_revision": 4, "identifiers": {"goodreads": ["1110294"], > "librarything": > ["1386651"]}}], "authors": []} > This is one of the ~5 million editions without a work. Maybe the LibraryThing ID can be used to link work-less editions to works, but I guess most work-less editions don't have it. > >> Is the code you used to convert the datadump to RDF available online >> (and is it Free software)? Since my proposed changes to OL's "native" >> RDF output [2] haven't been accepted yet, perhaps other approaches can >> be promoted somehow. Talis's approach works well, but I'm interested >> to see others too. > > since I had not much time and only needed the ISBN in a first approach anyway > I > did some crude regex to make me an ISBN-triple That is still interesting to hear :) > > -o > >> [1] http://www.mail-archive.com/[email protected]/msg00613.html >> [2] https://github.com/internetarchive/openlibrary/pull/136 (comments >> still welcome, naturally) _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
