Re: [ol-tech] 1.2 M links form lobid-resources to Open Library works

Karen Coyle Wed, 30 May 2012 08:16:20 -0700


On 5/30/12 6:41 AM, Pascal Christoph wrote:


>> Did you only link to Works, or to Editions too? ISBNs are associated
>
> we link only to works since it's not necessarily right to say that "two
> manifestations with the same ISBNs are the same manifestations" (because they
> can have different issue dates etc ...).

While I'm sure that there are cases where this happens, FYI a book with 
a new date *should* get a new ISBN, by the rules of ISBN. When the ISBN 
and the date do not match, 1) the publisher may have re-used the old 
ISBN (you have to pay for them, so some publishers "cheat") or 2) there 
could be differences in the metadata based on errors in cataloging or 
differences in cataloging practice.

In the merge algorithms that I've worked on, we've allowed years +-2 to 
be considered a match when both the ISBN and a significant part of the 
title matches.

What data elements are you using to link to Works?



>
> In the dump I found that this edition has no extra entry, so it can only be
> found included in the work-uri in the json dump (as seen above) (sort of
> confusing, since I am used to RDF which uses graphs to store data, and the
> edition DOES resolve ( http://openlibrary.org/books/OL12622527M ), so why not
> just link it in the work-description in the dump? (as more as I am thinking
> about it , I have to admit I only _begin_ to understand the dump ;) )

It looks to me like it's using the same logic that it uses for the 
display: if there is only one edition, it is included on the Work page. 
That's just a guess, though.

>
>> And regarding your remarkable example: that German version of Lord of
>> the Rings should not be linked to a German work, but to The One Work
>> called "The Lord of the Rings" (I consider the separate publications
>> of the three parts one work each). The German work is a duplicate.
>
> yep, guessed it. How to deduplicate?

It looks like Ben has already done what I was thinking of: there is a 
field for the Work title in the edition record. Ben has added "The Lord 
of the Rings" to that, which *should* bring the edition together with 
other works by that title. However, there are multiple works with that 
title, so I don't know where it will end up. The merging of works seems 
to need more, uh, work. I know that LibraryThing has a way to manually 
merge works, and I assume that's what OL will need. Algorithms only get 
you so far given the many differences in the metadata.

kc



  Should not be too hard because you have a
> work-link to library thing:
> $ grep 'librarything": \["1386651"' ol_dump_deworks_2012-03-31.txt | wc -l
> 49
>
> seems to bring up 49 editions for one work level, but just a first control
> sample shows no association with the work level at all:
>
> $ grep OL9177075M ol_dump_deworks_2012-03-31.txt
> /books/OL9177075M     {"editions": [{"publishers": ["RUSCONI"], 
> "physical_format":
> "Paperback", "last_modified": {"type": "/type/datetime", "value":
> "2011-04-29T03:29:19.321447"}, "created": {"type": "/type/datetime", "value":
> "2008-04-30T09:38:13.731961"}, "number_of_pages": 1359, "isbn_13":
> ["9788818123210"], "languages": [{"key": "/languages/ita"}], "isbn_10":
> ["8818123211"], "publish_date": "1985", "key": "/books/OL9177075M", "title":
> "IL SIGNORE DEGLI ANELLI (Titolo originale dell'opera: The Lord of the 
> Rings)",
> "oclc_numbers": ["635814336"], "revision": 4, "type": {"key": 
> "/type/edition"},
> "latest_revision": 4, "identifiers": {"goodreads": ["1110294"], 
> "librarything":
> ["1386651"]}}], "authors": []}
>
>> I recently published [1] a list of works that appear to be duplicates
>> (based on title, subtitle and author) which unfortunately showed that
>> a lot of cleaning up of edition-less works and duplicate works has to
>> be done.
>> That brings up another question: will you do the linking process again
>> in the future?
>
> yes I am willing to do so :)
>
>> I imagine that eventually many works (and authors, and probably
>> editions too) will be merged so that the Work URI you get back (in the
>> Edition data) when you lookup the same ISBN again may change in the
>> future. I don't think it will be a problem to have old URIs in your
>> data, as they will redirect to the new URI(s) when you look them up.
>> However, if you leave the old URIs in your dataset, you don't know for
>> sure how many distinct works are linked. And since Open Library data
>> changes regularly anyway, I don't suppose this was an one-time only
>> experiment?
>
> right
>
>> Is the code you used to convert the datadump to RDF available online
>> (and is it Free software)? Since my proposed changes to OL's "native"
>> RDF output [2] haven't been accepted yet, perhaps other approaches can
>> be promoted somehow. Talis's approach works well, but I'm interested
>> to see others too.
>
> since I had not much time and only needed the ISBN in a first approach anyway 
> I
> did some crude regex to make me an ISBN-triple
>
> -o
>
>> [1] http://www.mail-archive.com/[email protected]/msg00613.html
>> [2] https://github.com/internetarchive/openlibrary/pull/136 (comments
>> still welcome, naturally)
>>
>> On 23 May 2012 15:51, Pascal Christoph<[email protected]>  wrote:
>>> Hi *,
>>>
>>> today we achieved to link 1.2 M lobid.org resources to Open Library work
>>> resources, simply using isbn 10.
>>> It seems that no commonly used identifier (that would be: viaf or GND or ...
>>> and not an extra minted openlibrary identifier[1]) for creators in ol is 
>>> given.
>>> Identifier (among other things) help to disambiguate data so if you want to 
>>> you
>>> can enrich your data using our newly generated links. How to do that and a
>>> little bit more of background at our blog:
>>>
>>> https://wiki1.hbz-nrw.de/display/SEM/2012/05/23/1.2+M+links+to+Open+Library
>>>
>>> Yes, and let me say "thank you" for your amazing work - this is just one 
>>> more
>>> fine example of what is achivable with LOD!
>>>
>>> -o
>>>
>>> [1]it may be that there is already a concordance out there between i.e. viaf
>>> and ol-Person-URIs, I don't know , just saw whats already there in the RDF
>>>
>>> --
>>> Pascal Christoph
>>> - Linked Open Data: http://lobid.org/ -
>>> hbz - Hochschulbibliothekszentrum NRW
>>> Telefon +49-221-40075-139
>>> http://www.hbz-nrw.de/
>>> _______________________________________________
>>> Ol-tech mailing list
>>> [email protected]
>>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
>>> To unsubscribe from this mailing list, send email to 
>>> [email protected]
>> _______________________________________________
>> Ol-tech mailing list
>> [email protected]
>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
>> To unsubscribe from this mailing list, send email to 
>> [email protected]
>
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to 
> [email protected]
>

-- 
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Re: [ol-tech] 1.2 M links form lobid-resources to Open Library works

Reply via email to