Hi Pascal,

On 30 May 2012 15:41, Pascal Christoph <[email protected]> wrote:
> Hi Ben,
>
>
>> That is great! Linking to Open Library makes Open Library more visible
>> in the Linked Data world, I guess.
>>
>> I read your blog post, and would like to raise a couple of questions with 
>> you.
>> First of all: where are the links? I see no link to the OL website (or
>> a Work URI) on the page that is said to be an example...
>
> our SPARQL Endpoint has had a corrupted DB-file. Now it's reindexed - have
> another try.

It works! :)

>
> In the dump I found that this edition has no extra entry, so it can only be
> found included in the work-uri in the json dump (as seen above) (sort of
> confusing, since I am used to RDF which uses graphs to store data, and the
> edition DOES resolve ( http://openlibrary.org/books/OL12622527M ), so why not
> just link it in the work-description in the dump? (as more as I am thinking
> about it , I have to admit I only _begin_ to understand the dump ;) )

I think you are using a different type of dump than I am. And reading
on in your email, yours appears to be one of type "deworks". The dumps
I have used for my statistics, are the "normal" ones found in
http://openlibrary.org/developers/dumps and they have no author names
or complete edition records.
Anyway, I knew I had seen something about "deworks", and this was it:
https://github.com/internetarchive/openlibrary/issues/101
(denormalized works).
>
>> And regarding your remarkable example: that German version of Lord of
>> the Rings should not be linked to a German work, but to The One Work
>> called "The Lord of the Rings" (I consider the separate publications
>> of the three parts one work each). The German work is a duplicate.
>
> yep, guessed it. How to deduplicate? Should not be too hard because you have a
> work-link to library thing:
> $ grep 'librarything": \["1386651"' ol_dump_deworks_2012-03-31.txt | wc -l
> 49

If we trust the ID to be correct, that might work.
>
> seems to bring up 49 editions for one work level, but just a first control
> sample shows no association with the work level at all:
>
> $ grep OL9177075M ol_dump_deworks_2012-03-31.txt
> /books/OL9177075M       {"editions": [{"publishers": ["RUSCONI"], 
> "physical_format":
> "Paperback", "last_modified": {"type": "/type/datetime", "value":
> "2011-04-29T03:29:19.321447"}, "created": {"type": "/type/datetime", "value":
> "2008-04-30T09:38:13.731961"}, "number_of_pages": 1359, "isbn_13":
> ["9788818123210"], "languages": [{"key": "/languages/ita"}], "isbn_10":
> ["8818123211"], "publish_date": "1985", "key": "/books/OL9177075M", "title":
> "IL SIGNORE DEGLI ANELLI (Titolo originale dell'opera: The Lord of the 
> Rings)",
> "oclc_numbers": ["635814336"], "revision": 4, "type": {"key": 
> "/type/edition"},
> "latest_revision": 4, "identifiers": {"goodreads": ["1110294"], 
> "librarything":
> ["1386651"]}}], "authors": []}
>
This is one of the ~5 million editions without a work. Maybe the
LibraryThing ID can be used to link work-less editions to works, but I
guess most work-less editions don't have it.

>
>> Is the code you used to convert the datadump to RDF available online
>> (and is it Free software)? Since my proposed changes to OL's "native"
>> RDF output [2] haven't been accepted yet, perhaps other approaches can
>> be promoted somehow. Talis's approach works well, but I'm interested
>> to see others too.
>
> since I had not much time and only needed the ISBN in a first approach anyway 
> I
> did some crude regex to make me an ISBN-triple

That is still interesting to hear :)
>
> -o
>
>> [1] http://www.mail-archive.com/[email protected]/msg00613.html
>> [2] https://github.com/internetarchive/openlibrary/pull/136 (comments
>> still welcome, naturally)
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to