Yes, you are absolutely right, we should also move those names to the  
"contributor" area. At the moment, I don't believe that contributor  
has a place for role, but that's something else that would be useful.

The other two (from old catalog and the series statements) are ones  
that have been noted before, and the idea was to handle them  
algorithmically. "Fom old catalog" comes in from Library of Congress  
records, and the series statements in titles from Amazon.

kc

Quoting Alan Millar <amillar...@gmail.com>:

> On Wed, Nov 24, 2010 at 10:05 AM, Karen Coyle <kco...@kcoyle.net> wrote:
>> It might be necessary to drop them out of the Amazon data gathering,
>> although it would be a shame because they also contribute some of the
>> "long tail" books to the database. I wonder it it wouldn't at least be
>> possible to drop all of the instances of
>>     "(translator)" (case insensitive)
>> from the author strings and see how much that clears these up. (I also
>> saw a few cases of "[translator]" and there may be other patterns as
>> well.)
>
> Personally, I don't think we should automate dropping them; it is good
> metadata.  Rather, I think we should automate moving it into the
> additional people list.  The trick will be coming up with some
> judicious pattern matching smarts.
>
> (But here is another fun one that probably should be just dropped:
> http://openlibrary.org/search/authors?q=from+old+catalog
> :-)
>
> I see quite a few cases where useful metadata could be moved from one
> field to another.  Things such as book titles with series or edition
> suffixes like "(Great Classics Series)" or
> http://openlibrary.org/search?q=large+print+edition
> etc.  These follow fairly regular patterns, so it could be automated
> with supervision.
>
> I'd like to automate some of that myself, but I haven't come across
> any references to bulk update tools for users.  I've downloaded the
> dumps and grep'ed through them as information for author merges, but I
> haven't seen any way for me to do the actual updates besides a real
> browser.  The API docs indicate they are read-only for remote users.
>
> Anyone have any techniques they are using currently for mass updates?
>
> - Alan
> _______________________________________________
> Ol-discuss mailing list
> Ol-discuss@archive.org
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
> To unsubscribe from this mailing list, send email to  
> ol-discuss-unsubscr...@archive.org
>



-- 
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

_______________________________________________
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org

Reply via email to