Re: [ol-discuss] Metadata in author name

2010-11-30 Thread Karen Coyle
George, thanks, that's great. I will try out some role input, just for fun!

kc

Quoting George Oates g...@archive.org:

 Just to clarify - It's possible to attach a role to a contributor at  
 the edition
 level, and the list of available roles is wiki-editable.

 Cheers,
 george



 On 11/24/10 10:32 PM, Karen Coyle wrote:
 Yes, you are absolutely right, we should also move those names to the
 contributor area. At the moment, I don't believe that contributor
 has a place for role, but that's something else that would be useful.

 The other two (from old catalog and the series statements) are ones
 that have been noted before, and the idea was to handle them
 algorithmically. Fom old catalog comes in from Library of Congress
 records, and the series statements in titles from Amazon.

 kc

 Quoting Alan Millaramillar...@gmail.com:

 On Wed, Nov 24, 2010 at 10:05 AM, Karen Coylekco...@kcoyle.net  wrote:
 It might be necessary to drop them out of the Amazon data gathering,
 although it would be a shame because they also contribute some of the
 long tail books to the database. I wonder it it wouldn't at least be
 possible to drop all of the instances of
  (translator) (case insensitive)
 from the author strings and see how much that clears these up. (I also
 saw a few cases of [translator] and there may be other patterns as
 well.)

 Personally, I don't think we should automate dropping them; it is good
 metadata.  Rather, I think we should automate moving it into the
 additional people list.  The trick will be coming up with some
 judicious pattern matching smarts.

 (But here is another fun one that probably should be just dropped:
 http://openlibrary.org/search/authors?q=from+old+catalog
 :-)

 I see quite a few cases where useful metadata could be moved from one
 field to another.  Things such as book titles with series or edition
 suffixes like (Great Classics Series) or
 http://openlibrary.org/search?q=large+print+edition
 etc.  These follow fairly regular patterns, so it could be automated
 with supervision.

 I'd like to automate some of that myself, but I haven't come across
 any references to bulk update tools for users.  I've downloaded the
 dumps and grep'ed through them as information for author merges, but I
 haven't seen any way for me to do the actual updates besides a real
 browser.  The API docs indicate they are read-only for remote users.

 Anyone have any techniques they are using currently for mass updates?

 - Alan
 ___
 Ol-discuss mailing list
 Ol-discuss@archive.org
 http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
 To unsubscribe from this mailing list, send email to
 ol-discuss-unsubscr...@archive.org




 ___
 Ol-discuss mailing list
 Ol-discuss@archive.org
 http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
 To unsubscribe from this mailing list, send email to  
 ol-discuss-unsubscr...@archive.org




-- 
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

___
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org


Re: [ol-discuss] Metadata in author name

2010-11-29 Thread George Oates
Just to clarify - It's possible to attach a role to a contributor at the 
edition 
level, and the list of available roles is wiki-editable.

Cheers,
george



On 11/24/10 10:32 PM, Karen Coyle wrote:
 Yes, you are absolutely right, we should also move those names to the
 contributor area. At the moment, I don't believe that contributor
 has a place for role, but that's something else that would be useful.

 The other two (from old catalog and the series statements) are ones
 that have been noted before, and the idea was to handle them
 algorithmically. Fom old catalog comes in from Library of Congress
 records, and the series statements in titles from Amazon.

 kc

 Quoting Alan Millaramillar...@gmail.com:

 On Wed, Nov 24, 2010 at 10:05 AM, Karen Coylekco...@kcoyle.net  wrote:
 It might be necessary to drop them out of the Amazon data gathering,
 although it would be a shame because they also contribute some of the
 long tail books to the database. I wonder it it wouldn't at least be
 possible to drop all of the instances of
  (translator) (case insensitive)
 from the author strings and see how much that clears these up. (I also
 saw a few cases of [translator] and there may be other patterns as
 well.)

 Personally, I don't think we should automate dropping them; it is good
 metadata.  Rather, I think we should automate moving it into the
 additional people list.  The trick will be coming up with some
 judicious pattern matching smarts.

 (But here is another fun one that probably should be just dropped:
 http://openlibrary.org/search/authors?q=from+old+catalog
 :-)

 I see quite a few cases where useful metadata could be moved from one
 field to another.  Things such as book titles with series or edition
 suffixes like (Great Classics Series) or
 http://openlibrary.org/search?q=large+print+edition
 etc.  These follow fairly regular patterns, so it could be automated
 with supervision.

 I'd like to automate some of that myself, but I haven't come across
 any references to bulk update tools for users.  I've downloaded the
 dumps and grep'ed through them as information for author merges, but I
 haven't seen any way for me to do the actual updates besides a real
 browser.  The API docs indicate they are read-only for remote users.

 Anyone have any techniques they are using currently for mass updates?

 - Alan
 ___
 Ol-discuss mailing list
 Ol-discuss@archive.org
 http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
 To unsubscribe from this mailing list, send email to
 ol-discuss-unsubscr...@archive.org




___
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org


Re: [ol-discuss] Metadata in author name

2010-11-24 Thread Alan Millar
On Wed, Nov 24, 2010 at 10:05 AM, Karen Coyle kco...@kcoyle.net wrote:
 It might be necessary to drop them out of the Amazon data gathering,
 although it would be a shame because they also contribute some of the
 long tail books to the database. I wonder it it wouldn't at least be
 possible to drop all of the instances of
     (translator) (case insensitive)
 from the author strings and see how much that clears these up. (I also
 saw a few cases of [translator] and there may be other patterns as
 well.)

Personally, I don't think we should automate dropping them; it is good
metadata.  Rather, I think we should automate moving it into the
additional people list.  The trick will be coming up with some
judicious pattern matching smarts.

(But here is another fun one that probably should be just dropped:
http://openlibrary.org/search/authors?q=from+old+catalog
:-)

I see quite a few cases where useful metadata could be moved from one
field to another.  Things such as book titles with series or edition
suffixes like (Great Classics Series) or
http://openlibrary.org/search?q=large+print+edition
etc.  These follow fairly regular patterns, so it could be automated
with supervision.

I'd like to automate some of that myself, but I haven't come across
any references to bulk update tools for users.  I've downloaded the
dumps and grep'ed through them as information for author merges, but I
haven't seen any way for me to do the actual updates besides a real
browser.  The API docs indicate they are read-only for remote users.

Anyone have any techniques they are using currently for mass updates?

- Alan
___
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org