As a result of this thread, we've just spent a little time adding some info/docs to /developers:
Some more links here: http://openlibrary.org/developers And info on how to write a bot here: http://openlibrary.org/dev/docs/bots Bring on the Bots! :) On 11/30/10 11:03 AM, George Oates wrote: > Hi Alan, > > On 11/24/10 10:29 AM, Alan Millar wrote: >> On Wed, Nov 24, 2010 at 10:05 AM, Karen Coyle<kco...@kcoyle.net> wrote: >>> It might be necessary to drop them out of the Amazon data gathering, >>> although it would be a shame because they also contribute some of the >>> "long tail" books to the database. I wonder it it wouldn't at least be >>> possible to drop all of the instances of >> >> Personally, I don't think we should automate dropping them; it is good >> metadata. Rather, I think we should automate moving it into the >> additional people list. The trick will be coming up with some >> judicious pattern matching smarts. > > That's right. It's surprisingly hard to catch all the permutations of what you > perceive to be a pattern. > >> (But here is another fun one that probably should be just dropped: >> http://openlibrary.org/search/authors?q=from+old+catalog >> :-) > > In this example, there's variation in the characters which surround the from > old > catalog statement. Sometimes [], sometimes () etc. > >> I see quite a few cases where useful metadata could be moved from one >> field to another. Things such as book titles with series or edition >> suffixes like "(Great Classics Series)" or >> http://openlibrary.org/search?q=large+print+edition >> etc. These follow fairly regular patterns, so it could be automated >> with supervision. > > Absolutely. I've noticed you having a shot with "large print" and given the > frequency, it looks automated... is that right? (Super awesome!!) > > http://openlibrary.org/people/amillar > > Example edit: > http://openlibrary.org/books/OL11233153M/In_Spring_Time?b=3&a=2&_compare=Compare&m=diff > > > Looks like edits to some stuff was a bit tricksy? > e.g. http://openlibrary.org/recentchanges/2010/11/30/edit-book/42076112 > >> I'd like to automate some of that myself, but I haven't come across >> any references to bulk update tools for users. I've downloaded the >> dumps and grep'ed through them as information for author merges, but I >> haven't seen any way for me to do the actual updates besides a real >> browser. The API docs indicate they are read-only for remote users. > > We've certainly talked about how fantastic it would be to allow people out > there > to write bots to work on Open Library records. Presumably, each bot would need > to be reviewed by OL staff (or trusted contributors) before they are let loose > on the OL dataset... > > We could build a page under /developers that lists all the bots people write, > and provides steps for people to submit a bot for review. > > We've just been through that process on Wikipedia, fwiw. Was interesting - > http://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/OpenlibraryBot > > Would something like that be worth pursuing? > > We're also planning to expand the capacity to write data to OL via the API. > You > can see a list of the APIs we're wanting to document here: > > http://openlibrary.org/developers/api > > Alan - can you tell us what you're up to? > >> Anyone have any techniques they are using currently for mass updates? > > There are a few bots written by OL employees, like ImportBot, WorkBot, > OpenLibraryBot, StatsBot etc. > > Ben Gimpert wrote the bot that stamped matching records with Goodreads IDs, > and > an intern, Daniel, wrote something similar to do the same thing with > LibraryThing IDs. > > http://openlibrary.org/people/bgimpertBot > https://github.com/bgimpert/openlibrary > > http://openlibrary.org/people/IdentifierBot > https://github.com/dmontalvo/IdentifierBot > > As far as I know, there are no external mass updates happening, but as I say, > this would be fabulous to try to develop. And please, list peeps, correct me > if > I'm wrong! > > Cheers, > george > > _______________________________________________ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org