On 5/2/12 10:22 AM, Ben Companjen wrote:

> What the best solution is for you depends on your goals with the data, I 
> think.
> You could write software that tries to combine the "(2nd :" and
> "1962-1965)", but I don't know whether anyone would ever use that in a
> subject search.

Remember that OL organizes subjects in subject pages, it doesn't just 
allow search. The date-related subjects will gather together books with 
the same or similar dates. Unfortunately, LCSH (and probably the Amazon 
subjects) is pretty quirky.

I forgot the FAST link on the last message:

http://experimental.worldcat.org/fast/1360394/

That's the one for a similar heading, but not this exact heading. This 
heading has five subfields:

|a Vatican Council
|n (2nd :
|d 1962-1965).
|t Declaratio de libertate religiosa
|x Congresses.

and you'll see that "(2nd:" and "1962-1965)" are in separate subfields. 
There's no way to know that they are supposed to display with "(2nd: 
1962-1965)" as a single unit unless you go to the effort of intepreting 
the punctuation. In LCSH, the whole heading displays as a single string, 
and it's not designed well for being broken up into facets, as you can 
see. The "togetherness" of different elements is left up to the 
intelligence of the reader.

Believe me, it's a mess when you try to do something algorithmic and 
rational with it.

That doesn't explain the truncation of "religiosa", so there is still 
that problem.

kc

Your software could discard subjects that are not so
> useful (to you), like these.
> You could manually fix the subjects on openlibrary.org. In this case,
> clicking Edit on the book page would automatically create a work. You
> can add and edit subjects for works.
> If there is no redistribution of data, you can enhance the subject
> information using any other source you like (e.g. the ISBN DB you
> mention). If you know of a subject database that can be reused
> according to its license, even better.
> If you find certain sets of records contain many similar errors (like
> I found in the physical_description field some time ago), you could
> write a bot that automatically improves the live Open Library data.
>
> Since subjects are only editable from the web when they are in works,
> you should look in the work record that the book belongs to for
> possibly updated subject terms. This book has no Work yet (at the time
> of writing), but you can add one by editing the edition.
>
> Does this answer your question?
>
> Regards,
>
> Ben
>
> On 2 May 2012 15:49, Sujoy Ghosh<[email protected]>  wrote:
>> Hi,
>>
>> We have downloaded the open library edition data dump from following link
>> http://openlibrary.org/data/ol_dump_authors_latest.txt.gz
>>
>> While parsing the data, we found the subjects fields are corrupted for many
>> editions. eg.
>>
>> For /books/OL7974826M (isbn=0809119935), the subjects filed is given
>> following value
>>
>> "subjects": ["1962-1965)", "Congresses", "Declaratio de libertate religi",
>> "(2nd :", "Vatican Council"]
>>
>> By using isbndb api I got below subjects data
>>
>> <Subjects>
>>       <Subject
>> subject_id="vatican_council_2nd_1962_1965_declaratio_de_libertate_religi">
>>        Vatican Council -- (2nd :1962-1965). -- Declaratio de libertate
>> religiosa -- Congresses
>>        </Subject>
>>        <Subject subject_id="freedom_of_religion_congresses">Freedom of
>> religion -- Congresses</Subject>
>> </Subjects>
>>
>> It's clearly visible that the open library data is corrupted in the subjects
>> filed. This is observed in so many other editions also in the dump.
>>
>> Can you please help us to find out the correct data? Can you suggest any
>> solution?
>>
>> Rgds,
>> Sujoy
>>
>>
>>
>> _______________________________________________
>> Ol-tech mailing list
>> [email protected]
>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
>> To unsubscribe from this mailing list, send email to
>> [email protected]
>>
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to 
> [email protected]

-- 
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to