As a result of this thread, we've just spent a little time adding some 
info/docs 
to /developers:

Some more links here:
http://openlibrary.org/developers

And info on how to write a bot here:
http://openlibrary.org/dev/docs/bots

Bring on the Bots!

:)


On 11/30/10 11:03 AM, George Oates wrote:
> Hi Alan,
>
> On 11/24/10 10:29 AM, Alan Millar wrote:
>> On Wed, Nov 24, 2010 at 10:05 AM, Karen Coyle<kco...@kcoyle.net> wrote:
>>> It might be necessary to drop them out of the Amazon data gathering,
>>> although it would be a shame because they also contribute some of the
>>> "long tail" books to the database. I wonder it it wouldn't at least be
>>> possible to drop all of the instances of
>>
>> Personally, I don't think we should automate dropping them; it is good
>> metadata. Rather, I think we should automate moving it into the
>> additional people list. The trick will be coming up with some
>> judicious pattern matching smarts.
>
> That's right. It's surprisingly hard to catch all the permutations of what you
> perceive to be a pattern.
>
>> (But here is another fun one that probably should be just dropped:
>> http://openlibrary.org/search/authors?q=from+old+catalog
>> :-)
>
> In this example, there's variation in the characters which surround the from 
> old
> catalog statement. Sometimes [], sometimes () etc.
>
>> I see quite a few cases where useful metadata could be moved from one
>> field to another. Things such as book titles with series or edition
>> suffixes like "(Great Classics Series)" or
>> http://openlibrary.org/search?q=large+print+edition
>> etc. These follow fairly regular patterns, so it could be automated
>> with supervision.
>
> Absolutely. I've noticed you having a shot with "large print" and given the
> frequency, it looks automated... is that right? (Super awesome!!)
>
> http://openlibrary.org/people/amillar
>
> Example edit:
> http://openlibrary.org/books/OL11233153M/In_Spring_Time?b=3&a=2&_compare=Compare&m=diff
>
>
> Looks like edits to some stuff was a bit tricksy?
> e.g. http://openlibrary.org/recentchanges/2010/11/30/edit-book/42076112
>
>> I'd like to automate some of that myself, but I haven't come across
>> any references to bulk update tools for users. I've downloaded the
>> dumps and grep'ed through them as information for author merges, but I
>> haven't seen any way for me to do the actual updates besides a real
>> browser. The API docs indicate they are read-only for remote users.
>
> We've certainly talked about how fantastic it would be to allow people out 
> there
> to write bots to work on Open Library records. Presumably, each bot would need
> to be reviewed by OL staff (or trusted contributors) before they are let loose
> on the OL dataset...
>
> We could build a page under /developers that lists all the bots people write,
> and provides steps for people to submit a bot for review.
>
> We've just been through that process on Wikipedia, fwiw. Was interesting -
> http://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/OpenlibraryBot
>
> Would something like that be worth pursuing?
>
> We're also planning to expand the capacity to write data to OL via the API. 
> You
> can see a list of the APIs we're wanting to document here:
>
> http://openlibrary.org/developers/api
>
> Alan - can you tell us what you're up to?
>
>> Anyone have any techniques they are using currently for mass updates?
>
> There are a few bots written by OL employees, like ImportBot, WorkBot,
> OpenLibraryBot, StatsBot etc.
>
> Ben Gimpert wrote the bot that stamped matching records with Goodreads IDs, 
> and
> an intern, Daniel, wrote something similar to do the same thing with
> LibraryThing IDs.
>
> http://openlibrary.org/people/bgimpertBot
> https://github.com/bgimpert/openlibrary
>
> http://openlibrary.org/people/IdentifierBot
> https://github.com/dmontalvo/IdentifierBot
>
> As far as I know, there are no external mass updates happening, but as I say,
> this would be fabulous to try to develop. And please, list peeps, correct me 
> if
> I'm wrong!
>
> Cheers,
> george
>
>
_______________________________________________
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org

Reply via email to