Ah.  thanks!

-jcg

John Gonzalez
Director, Engineering and Service Availability
j...@archive.org



> On Sep 25, 2015, at 1:37 PM, Hank Bromley <h...@archive.org> wrote:
> 
> The "missing pages in EPUB" problem is a known one. See:
> 
> https://webarchive.jira.com/browse/WEBDEV-3
> 
> and my comment on it a month ago, confirming Jeff's observations:
> 
> https://webarchive.jira.com/browse/WEBDEV-3?focusedCommentId=60093#comment-60093
> 
> -- Hank
> 
> On Fri, 25 Sep 2015, John Gonzalez wrote:
> 
>> Hey there.  Thank you for the head’s up.  Can you please provide links to a 
>> couple of examples so that we can investigate further?
>> 
>> Thanks!
>> 
>> -jcg
>> 
>> John Gonzalez
>> Director, Engineering and Service Availability
>> j...@archive.org
>> 
>> 
>> 
>>> On Sep 25, 2015, at 1:18 PM, Jon Leech <oddh...@sonic.net> wrote:
>>> 
>>> On Fri, Sep 25, 2015 at 03:58:40PM -0400, Tom Morris wrote:
>>>> Someone asked me off-list what types of OpenLibrary data cleanups I'd
>>>> suggest.  Below is the list that I came up with off the top of my head.
>>>> What others would folks suggest?  What do you think is more important?
>>>> 
>>>> Possible data cleanup targets:
>>> ...
>>>> - clean OCR'd texts (actually an IA task, not OL)
>>> 
>>>   There seem to be quite a lot of ebooks in OL which are simply
>>> missing pages and pages. It appears to be a systematic problem in the
>>> scan -> ebook step, since the PDFs have all been OK in the cases I've
>>> looked at.
>>> 
>>>   A library which is offering books that don't have all their pages is
>>> not actually providing a useful service, so this is most important in
>>> your list, IMO. If not fixing the ebooks, at least some automated way to
>>> attempt to tag the broken ones and remove them. Since the broken ebooks
>>> I came across were all broken in structurally similar ways (missing
>>> pages at the start of the first chapter, and I think often at the start
>>> of other chapters as well), perhaps that's amenable to automated
>>> detection by comparing ebook and PDF.
>>> 
>>>   Jon
>>> _______________________________________________
>>> Ol-tech mailing list
>>> ol-tech@archive.org
>>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
>>> Archives: http://www.mail-archive.com/ol-tech@archive.org/
>>> To unsubscribe from this mailing list, send email to 
>>> ol-tech-unsubscr...@archive.org
_______________________________________________
Ol-tech mailing list
ol-tech@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
Archives: http://www.mail-archive.com/ol-tech@archive.org/
To unsubscribe from this mailing list, send email to 
ol-tech-unsubscr...@archive.org

Reply via email to