On Mon, 2012-06-04 at 01:11 +0200, Ben Companjen wrote: > The bad thing is, well more a glitch, if I'm correct: one has to > scrape author IDs from these search pages, because there is no > wildcard search in the API. I noticed AMillarBot was > replacing/correcting missing Umlauts, so perhaps some of the code is > already there.
Unfortunately, the AMillarBot work is not a good reference. It is indeed scraping the search pages, for a handful of patterns I came across like "Beitra>ge"="Beiträge" (did a search on "title:beitra ge"). It is really just a cheap hack, and doesn't scale at all. I also just discovered the REAL fix for this anyways: OL already has the correct data, it just didn't get imported right. Ack!! Take a look at these: http://openlibrary.org/authors/OL4459814A/Heinrich_Schro_der http://openlibrary.org/works/OL10684450W/Tonbandgera_te-Messpraxis http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:299045317:529 The Marc record shows the proper original data, at least in my browser, while the imported items are mangled. These just need to be re-processed, and I'm not going to re-invent the importer :-( I guess I better file a bug report. - Alan _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
