Hi Alan,

On 11/24/10 10:29 AM, Alan Millar wrote:
> On Wed, Nov 24, 2010 at 10:05 AM, Karen Coyle<kco...@kcoyle.net>  wrote:
>> It might be necessary to drop them out of the Amazon data gathering,
>> although it would be a shame because they also contribute some of the
>> "long tail" books to the database. I wonder it it wouldn't at least be
>> possible to drop all of the instances of
>
> Personally, I don't think we should automate dropping them; it is good
> metadata.  Rather, I think we should automate moving it into the
> additional people list.  The trick will be coming up with some
> judicious pattern matching smarts.

That's right. It's surprisingly hard to catch all the permutations of what you 
perceive to be a pattern.

> (But here is another fun one that probably should be just dropped:
> http://openlibrary.org/search/authors?q=from+old+catalog
> :-)

In this example, there's variation in the characters which surround the from 
old 
catalog statement. Sometimes [], sometimes () etc.

> I see quite a few cases where useful metadata could be moved from one
> field to another.  Things such as book titles with series or edition
> suffixes like "(Great Classics Series)" or
> http://openlibrary.org/search?q=large+print+edition
> etc.  These follow fairly regular patterns, so it could be automated
> with supervision.

Absolutely. I've noticed you having a shot with "large print" and given the 
frequency, it looks automated... is that right? (Super awesome!!)

http://openlibrary.org/people/amillar

Example edit:
http://openlibrary.org/books/OL11233153M/In_Spring_Time?b=3&a=2&_compare=Compare&m=diff

Looks like edits to some stuff was a bit tricksy?
e.g. http://openlibrary.org/recentchanges/2010/11/30/edit-book/42076112

> I'd like to automate some of that myself, but I haven't come across
> any references to bulk update tools for users.  I've downloaded the
> dumps and grep'ed through them as information for author merges, but I
> haven't seen any way for me to do the actual updates besides a real
> browser.  The API docs indicate they are read-only for remote users.

We've certainly talked about how fantastic it would be to allow people out 
there 
to write bots to work on Open Library records. Presumably, each bot would need 
to be reviewed by OL staff (or trusted contributors) before they are let loose 
on the OL dataset...

We could build a page under /developers that lists all the bots people write, 
and provides steps for people to submit a bot for review.

We've just been through that process on Wikipedia, fwiw. Was interesting - 
http://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/OpenlibraryBot

Would something like that be worth pursuing?

We're also planning to expand the capacity to write data to OL via the API. You 
can see a list of the APIs we're wanting to document here:

http://openlibrary.org/developers/api

Alan - can you tell us what you're up to?

> Anyone have any techniques they are using currently for mass updates?

There are a few bots written by OL employees, like ImportBot, WorkBot, 
OpenLibraryBot, StatsBot etc.

Ben Gimpert wrote the bot that stamped matching records with Goodreads IDs, and 
an intern, Daniel, wrote something similar to do the same thing with 
LibraryThing IDs.

http://openlibrary.org/people/bgimpertBot
https://github.com/bgimpert/openlibrary

http://openlibrary.org/people/IdentifierBot
https://github.com/dmontalvo/IdentifierBot

As far as I know, there are no external mass updates happening, but as I say, 
this would be fabulous to try to develop. And please, list peeps, correct me if 
I'm wrong!

Cheers,
george


_______________________________________________
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org

Reply via email to