Can you specify which parts you consider to be corrupted? The OL subject 
treatment does break up the LCSH subfields into separate subject terms 
(somewhat like FAST [1]). It looks to me like the date subfield "(2nd 
:1962-1965)" was parsed into two parts, and of course the punctuation in 
the subfield is causing problems. The other issue I see is that the 
subfield ending with "religiosa" got truncated. Are there other problems 
that you see?

kc

On 5/2/12 6:49 AM, Sujoy Ghosh wrote:
> Hi,
>
> We have downloaded the open library edition data dump from following link
> http://openlibrary.org/data/ol_dump_authors_latest.txt.gz
>
> While parsing the data, we found the subjects fields are corrupted for
> many editions. eg.
>
> For /books/OL7974826M (isbn=0809119935), the subjects filed is given
> following value
>
> "subjects": ["1962-1965)", "Congresses", "Declaratio de libertate
> religi", "(2nd :", "Vatican Council"]
>
> By using isbndb api I got below subjects data
>
> <Subjects>
> <Subject
> subject_id="vatican_council_2nd_1962_1965_declaratio_de_libertate_religi">
>        Vatican Council -- (2nd :1962-1965). -- Declaratio de libertate
> religiosa -- Congresses
> </Subject>
> <Subject subject_id="freedom_of_religion_congresses">Freedom of religion
> -- Congresses</Subject>
> </Subjects>
>
> It's clearly visible that the open library data is corrupted in the
> subjects filed. This is observed in so many other editions also in the dump.
>
> Can you please help us to find out the correct data? Can you suggest any
> solution?
>
> Rgds,
> Sujoy
>
>
>
>
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to 
> [email protected]

-- 
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to