Re: [ol-discuss] Metadata in author name
George, thanks, that's great. I will try out some role input, just for fun! kc Quoting George Oates g...@archive.org: Just to clarify - It's possible to attach a role to a contributor at the edition level, and the list of available roles is wiki-editable. Cheers, george On 11/24/10 10:32 PM, Karen Coyle wrote: Yes, you are absolutely right, we should also move those names to the contributor area. At the moment, I don't believe that contributor has a place for role, but that's something else that would be useful. The other two (from old catalog and the series statements) are ones that have been noted before, and the idea was to handle them algorithmically. Fom old catalog comes in from Library of Congress records, and the series statements in titles from Amazon. kc Quoting Alan Millaramillar...@gmail.com: On Wed, Nov 24, 2010 at 10:05 AM, Karen Coylekco...@kcoyle.net wrote: It might be necessary to drop them out of the Amazon data gathering, although it would be a shame because they also contribute some of the long tail books to the database. I wonder it it wouldn't at least be possible to drop all of the instances of (translator) (case insensitive) from the author strings and see how much that clears these up. (I also saw a few cases of [translator] and there may be other patterns as well.) Personally, I don't think we should automate dropping them; it is good metadata. Rather, I think we should automate moving it into the additional people list. The trick will be coming up with some judicious pattern matching smarts. (But here is another fun one that probably should be just dropped: http://openlibrary.org/search/authors?q=from+old+catalog :-) I see quite a few cases where useful metadata could be moved from one field to another. Things such as book titles with series or edition suffixes like (Great Classics Series) or http://openlibrary.org/search?q=large+print+edition etc. These follow fairly regular patterns, so it could be automated with supervision. I'd like to automate some of that myself, but I haven't come across any references to bulk update tools for users. I've downloaded the dumps and grep'ed through them as information for author merges, but I haven't seen any way for me to do the actual updates besides a real browser. The API docs indicate they are read-only for remote users. Anyone have any techniques they are using currently for mass updates? - Alan ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org
Re: [ol-discuss] Metadata in author name
Just to clarify - It's possible to attach a role to a contributor at the edition level, and the list of available roles is wiki-editable. Cheers, george On 11/24/10 10:32 PM, Karen Coyle wrote: Yes, you are absolutely right, we should also move those names to the contributor area. At the moment, I don't believe that contributor has a place for role, but that's something else that would be useful. The other two (from old catalog and the series statements) are ones that have been noted before, and the idea was to handle them algorithmically. Fom old catalog comes in from Library of Congress records, and the series statements in titles from Amazon. kc Quoting Alan Millaramillar...@gmail.com: On Wed, Nov 24, 2010 at 10:05 AM, Karen Coylekco...@kcoyle.net wrote: It might be necessary to drop them out of the Amazon data gathering, although it would be a shame because they also contribute some of the long tail books to the database. I wonder it it wouldn't at least be possible to drop all of the instances of (translator) (case insensitive) from the author strings and see how much that clears these up. (I also saw a few cases of [translator] and there may be other patterns as well.) Personally, I don't think we should automate dropping them; it is good metadata. Rather, I think we should automate moving it into the additional people list. The trick will be coming up with some judicious pattern matching smarts. (But here is another fun one that probably should be just dropped: http://openlibrary.org/search/authors?q=from+old+catalog :-) I see quite a few cases where useful metadata could be moved from one field to another. Things such as book titles with series or edition suffixes like (Great Classics Series) or http://openlibrary.org/search?q=large+print+edition etc. These follow fairly regular patterns, so it could be automated with supervision. I'd like to automate some of that myself, but I haven't come across any references to bulk update tools for users. I've downloaded the dumps and grep'ed through them as information for author merges, but I haven't seen any way for me to do the actual updates besides a real browser. The API docs indicate they are read-only for remote users. Anyone have any techniques they are using currently for mass updates? - Alan ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org
Re: [ol-discuss] Metadata in author name
On Wed, Nov 24, 2010 at 10:05 AM, Karen Coyle kco...@kcoyle.net wrote: It might be necessary to drop them out of the Amazon data gathering, although it would be a shame because they also contribute some of the long tail books to the database. I wonder it it wouldn't at least be possible to drop all of the instances of (translator) (case insensitive) from the author strings and see how much that clears these up. (I also saw a few cases of [translator] and there may be other patterns as well.) Personally, I don't think we should automate dropping them; it is good metadata. Rather, I think we should automate moving it into the additional people list. The trick will be coming up with some judicious pattern matching smarts. (But here is another fun one that probably should be just dropped: http://openlibrary.org/search/authors?q=from+old+catalog :-) I see quite a few cases where useful metadata could be moved from one field to another. Things such as book titles with series or edition suffixes like (Great Classics Series) or http://openlibrary.org/search?q=large+print+edition etc. These follow fairly regular patterns, so it could be automated with supervision. I'd like to automate some of that myself, but I haven't come across any references to bulk update tools for users. I've downloaded the dumps and grep'ed through them as information for author merges, but I haven't seen any way for me to do the actual updates besides a real browser. The API docs indicate they are read-only for remote users. Anyone have any techniques they are using currently for mass updates? - Alan ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org