Hey there. Thank you for the head’s up. Can you please provide links to a couple of examples so that we can investigate further?
Thanks! -jcg John Gonzalez Director, Engineering and Service Availability [email protected] > On Sep 25, 2015, at 1:18 PM, Jon Leech <[email protected]> wrote: > > On Fri, Sep 25, 2015 at 03:58:40PM -0400, Tom Morris wrote: >> Someone asked me off-list what types of OpenLibrary data cleanups I'd >> suggest. Below is the list that I came up with off the top of my head. >> What others would folks suggest? What do you think is more important? >> >> Possible data cleanup targets: > ... >> - clean OCR'd texts (actually an IA task, not OL) > > There seem to be quite a lot of ebooks in OL which are simply > missing pages and pages. It appears to be a systematic problem in the > scan -> ebook step, since the PDFs have all been OK in the cases I've > looked at. > > A library which is offering books that don't have all their pages is > not actually providing a useful service, so this is most important in > your list, IMO. If not fixing the ebooks, at least some automated way to > attempt to tag the broken ones and remove them. Since the broken ebooks > I came across were all broken in structurally similar ways (missing > pages at the start of the first chapter, and I think often at the start > of other chapters as well), perhaps that's amenable to automated > detection by comparing ebook and PDF. > > Jon > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > Archives: http://www.mail-archive.com/[email protected]/ > To unsubscribe from this mailing list, send email to > [email protected] _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech Archives: http://www.mail-archive.com/[email protected]/ To unsubscribe from this mailing list, send email to [email protected]
