Hey there.  Thank you for the head’s up.  Can you please provide links to a 
couple of examples so that we can investigate further?

Thanks!

-jcg

John Gonzalez
Director, Engineering and Service Availability
j...@archive.org



> On Sep 25, 2015, at 1:18 PM, Jon Leech <oddh...@sonic.net> wrote:
> 
> On Fri, Sep 25, 2015 at 03:58:40PM -0400, Tom Morris wrote:
>> Someone asked me off-list what types of OpenLibrary data cleanups I'd
>> suggest.  Below is the list that I came up with off the top of my head.
>> What others would folks suggest?  What do you think is more important?
>> 
>> Possible data cleanup targets:
> ...
>> - clean OCR'd texts (actually an IA task, not OL)
> 
>    There seem to be quite a lot of ebooks in OL which are simply
> missing pages and pages. It appears to be a systematic problem in the
> scan -> ebook step, since the PDFs have all been OK in the cases I've
> looked at.
> 
>    A library which is offering books that don't have all their pages is
> not actually providing a useful service, so this is most important in
> your list, IMO. If not fixing the ebooks, at least some automated way to
> attempt to tag the broken ones and remove them. Since the broken ebooks
> I came across were all broken in structurally similar ways (missing
> pages at the start of the first chapter, and I think often at the start
> of other chapters as well), perhaps that's amenable to automated
> detection by comparing ebook and PDF.
> 
>    Jon
> _______________________________________________
> Ol-tech mailing list
> ol-tech@archive.org
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> Archives: http://www.mail-archive.com/ol-tech@archive.org/
> To unsubscribe from this mailing list, send email to 
> ol-tech-unsubscr...@archive.org
_______________________________________________
Ol-tech mailing list
ol-tech@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
Archives: http://www.mail-archive.com/ol-tech@archive.org/
To unsubscribe from this mailing list, send email to 
ol-tech-unsubscr...@archive.org

Reply via email to