Hey there. Thank you for the head’s up. Can you please provide links to a couple of examples so that we can investigate further?
Thanks! -jcg John Gonzalez Director, Engineering and Service Availability j...@archive.org > On Sep 25, 2015, at 1:18 PM, Jon Leech <oddh...@sonic.net> wrote: > > On Fri, Sep 25, 2015 at 03:58:40PM -0400, Tom Morris wrote: >> Someone asked me off-list what types of OpenLibrary data cleanups I'd >> suggest. Below is the list that I came up with off the top of my head. >> What others would folks suggest? What do you think is more important? >> >> Possible data cleanup targets: > ... >> - clean OCR'd texts (actually an IA task, not OL) > > There seem to be quite a lot of ebooks in OL which are simply > missing pages and pages. It appears to be a systematic problem in the > scan -> ebook step, since the PDFs have all been OK in the cases I've > looked at. > > A library which is offering books that don't have all their pages is > not actually providing a useful service, so this is most important in > your list, IMO. If not fixing the ebooks, at least some automated way to > attempt to tag the broken ones and remove them. Since the broken ebooks > I came across were all broken in structurally similar ways (missing > pages at the start of the first chapter, and I think often at the start > of other chapters as well), perhaps that's amenable to automated > detection by comparing ebook and PDF. > > Jon > _______________________________________________ > Ol-tech mailing list > ol-tech@archive.org > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > Archives: http://www.mail-archive.com/ol-tech@archive.org/ > To unsubscribe from this mailing list, send email to > ol-tech-unsubscr...@archive.org _______________________________________________ Ol-tech mailing list ol-tech@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech Archives: http://www.mail-archive.com/ol-tech@archive.org/ To unsubscribe from this mailing list, send email to ol-tech-unsubscr...@archive.org