Ah. thanks! -jcg
John Gonzalez Director, Engineering and Service Availability j...@archive.org > On Sep 25, 2015, at 1:37 PM, Hank Bromley <h...@archive.org> wrote: > > The "missing pages in EPUB" problem is a known one. See: > > https://webarchive.jira.com/browse/WEBDEV-3 > > and my comment on it a month ago, confirming Jeff's observations: > > https://webarchive.jira.com/browse/WEBDEV-3?focusedCommentId=60093#comment-60093 > > -- Hank > > On Fri, 25 Sep 2015, John Gonzalez wrote: > >> Hey there. Thank you for the head’s up. Can you please provide links to a >> couple of examples so that we can investigate further? >> >> Thanks! >> >> -jcg >> >> John Gonzalez >> Director, Engineering and Service Availability >> j...@archive.org >> >> >> >>> On Sep 25, 2015, at 1:18 PM, Jon Leech <oddh...@sonic.net> wrote: >>> >>> On Fri, Sep 25, 2015 at 03:58:40PM -0400, Tom Morris wrote: >>>> Someone asked me off-list what types of OpenLibrary data cleanups I'd >>>> suggest. Below is the list that I came up with off the top of my head. >>>> What others would folks suggest? What do you think is more important? >>>> >>>> Possible data cleanup targets: >>> ... >>>> - clean OCR'd texts (actually an IA task, not OL) >>> >>> There seem to be quite a lot of ebooks in OL which are simply >>> missing pages and pages. It appears to be a systematic problem in the >>> scan -> ebook step, since the PDFs have all been OK in the cases I've >>> looked at. >>> >>> A library which is offering books that don't have all their pages is >>> not actually providing a useful service, so this is most important in >>> your list, IMO. If not fixing the ebooks, at least some automated way to >>> attempt to tag the broken ones and remove them. Since the broken ebooks >>> I came across were all broken in structurally similar ways (missing >>> pages at the start of the first chapter, and I think often at the start >>> of other chapters as well), perhaps that's amenable to automated >>> detection by comparing ebook and PDF. >>> >>> Jon >>> _______________________________________________ >>> Ol-tech mailing list >>> ol-tech@archive.org >>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech >>> Archives: http://www.mail-archive.com/ol-tech@archive.org/ >>> To unsubscribe from this mailing list, send email to >>> ol-tech-unsubscr...@archive.org _______________________________________________ Ol-tech mailing list ol-tech@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech Archives: http://www.mail-archive.com/ol-tech@archive.org/ To unsubscribe from this mailing list, send email to ol-tech-unsubscr...@archive.org