On 5/2/12 10:22 AM, Ben Companjen wrote: > What the best solution is for you depends on your goals with the data, I > think. > You could write software that tries to combine the "(2nd :" and > "1962-1965)", but I don't know whether anyone would ever use that in a > subject search.
Remember that OL organizes subjects in subject pages, it doesn't just allow search. The date-related subjects will gather together books with the same or similar dates. Unfortunately, LCSH (and probably the Amazon subjects) is pretty quirky. I forgot the FAST link on the last message: http://experimental.worldcat.org/fast/1360394/ That's the one for a similar heading, but not this exact heading. This heading has five subfields: |a Vatican Council |n (2nd : |d 1962-1965). |t Declaratio de libertate religiosa |x Congresses. and you'll see that "(2nd:" and "1962-1965)" are in separate subfields. There's no way to know that they are supposed to display with "(2nd: 1962-1965)" as a single unit unless you go to the effort of intepreting the punctuation. In LCSH, the whole heading displays as a single string, and it's not designed well for being broken up into facets, as you can see. The "togetherness" of different elements is left up to the intelligence of the reader. Believe me, it's a mess when you try to do something algorithmic and rational with it. That doesn't explain the truncation of "religiosa", so there is still that problem. kc Your software could discard subjects that are not so > useful (to you), like these. > You could manually fix the subjects on openlibrary.org. In this case, > clicking Edit on the book page would automatically create a work. You > can add and edit subjects for works. > If there is no redistribution of data, you can enhance the subject > information using any other source you like (e.g. the ISBN DB you > mention). If you know of a subject database that can be reused > according to its license, even better. > If you find certain sets of records contain many similar errors (like > I found in the physical_description field some time ago), you could > write a bot that automatically improves the live Open Library data. > > Since subjects are only editable from the web when they are in works, > you should look in the work record that the book belongs to for > possibly updated subject terms. This book has no Work yet (at the time > of writing), but you can add one by editing the edition. > > Does this answer your question? > > Regards, > > Ben > > On 2 May 2012 15:49, Sujoy Ghosh<[email protected]> wrote: >> Hi, >> >> We have downloaded the open library edition data dump from following link >> http://openlibrary.org/data/ol_dump_authors_latest.txt.gz >> >> While parsing the data, we found the subjects fields are corrupted for many >> editions. eg. >> >> For /books/OL7974826M (isbn=0809119935), the subjects filed is given >> following value >> >> "subjects": ["1962-1965)", "Congresses", "Declaratio de libertate religi", >> "(2nd :", "Vatican Council"] >> >> By using isbndb api I got below subjects data >> >> <Subjects> >> <Subject >> subject_id="vatican_council_2nd_1962_1965_declaratio_de_libertate_religi"> >> Vatican Council -- (2nd :1962-1965). -- Declaratio de libertate >> religiosa -- Congresses >> </Subject> >> <Subject subject_id="freedom_of_religion_congresses">Freedom of >> religion -- Congresses</Subject> >> </Subjects> >> >> It's clearly visible that the open library data is corrupted in the subjects >> filed. This is observed in so many other editions also in the dump. >> >> Can you please help us to find out the correct data? Can you suggest any >> solution? >> >> Rgds, >> Sujoy >> >> >> >> _______________________________________________ >> Ol-tech mailing list >> [email protected] >> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech >> To unsubscribe from this mailing list, send email to >> [email protected] >> > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected] -- Karen Coyle [email protected] http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
