On 02-May-2012, at 7:19 PM, Sujoy Ghosh wrote:

> Hi,
> 
> We have downloaded the open library edition data dump from following link
> http://openlibrary.org/data/ol_dump_authors_latest.txt.gz
> 
> While parsing the data, we found the subjects fields are corrupted for many 
> editions. eg.
> 
> For /books/OL7974826M (isbn=0809119935), the subjects filed is given 
> following value
> 
> "subjects": ["1962-1965)", "Congresses", "Declaratio de libertate religi", 
> "(2nd :", "Vatican Council"]
> 
> By using isbndb api I got below subjects data
> 
> <Subjects>
>      <Subject 
> subject_id="vatican_council_2nd_1962_1965_declaratio_de_libertate_religi">
>       Vatican Council -- (2nd :1962-1965). -- Declaratio de libertate 
> religiosa -- Congresses
>       </Subject>
>       <Subject subject_id="freedom_of_religion_congresses">Freedom of 
> religion -- Congresses</Subject>
> </Subjects>
> 
> It's clearly visible that the open library data is corrupted in the subjects 
> filed. This is observed in so many other editions also in the dump.
> 
> Can you please help us to find out the correct data? Can you suggest any 
> solution?

Open Library flattens the subjects, by design.

You can see the presentation by George Oates, explaining how we convert MARC 
subjects into OL subject pages.

http://archive.org/stream/Kohacon2010OpenLibraryPresentation-GeorgeOates/kohacon-2010#page/n45/mode/2up

Also, we fold the subjects into work pages. The subject data in the editions is 
legacy and we don't use it really. If you are planning to look at subjects on 
OL data, you should use works dumps.

Anand
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to