Re: [ol-discuss] Metadata in author name

Alan Millar Wed, 24 Nov 2010 10:30:09 -0800

On Wed, Nov 24, 2010 at 10:05 AM, Karen Coyle <kco...@kcoyle.net> wrote:
> It might be necessary to drop them out of the Amazon data gathering,
> although it would be a shame because they also contribute some of the
> "long tail" books to the database. I wonder it it wouldn't at least be
> possible to drop all of the instances of
>     "(translator)" (case insensitive)
> from the author strings and see how much that clears these up. (I also
> saw a few cases of "[translator]" and there may be other patterns as
> well.)


Personally, I don't think we should automate dropping them; it is good
metadata.  Rather, I think we should automate moving it into the
additional people list.  The trick will be coming up with some
judicious pattern matching smarts.

(But here is another fun one that probably should be just dropped:
http://openlibrary.org/search/authors?q=from+old+catalog
:-)

I see quite a few cases where useful metadata could be moved from one
field to another.  Things such as book titles with series or edition
suffixes like "(Great Classics Series)" or
http://openlibrary.org/search?q=large+print+edition
etc.  These follow fairly regular patterns, so it could be automated
with supervision.

I'd like to automate some of that myself, but I haven't come across
any references to bulk update tools for users.  I've downloaded the
dumps and grep'ed through them as information for author merges, but I
haven't seen any way for me to do the actual updates besides a real
browser.  The API docs indicate they are read-only for remote users.

Anyone have any techniques they are using currently for mass updates?

- Alan
_______________________________________________
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org

Re: [ol-discuss] Metadata in author name

Reply via email to