On Thu, Jul 11, 2019 at 11:22 PM Alex Brollo <[email protected]> wrote: > > I don't understand fully your statement "Right now, I'm going to convert them > to DjVu and upload them, without any text information.". Don't you feel any > need of an excellent OCR layer when proofreading it into wikisource?
I reuploaded the first issue of Weird Tales in DjVu because the PDF was significantly fuzzier than the DjVu, and looking at the PDF OCR, it's slightly better than what I can get from the interface. Given the choice between better images and better OCR, I go with the first one. > Do you feel fully satisfied by mediawiki OCR of images? I can't even get the MediaWiki OCR to work. I use the Google OCR gadget. > I don't know how to get xml data about mapping of words into page image. It's a pretty distant concern for me, somewhat tangential to producing transcriptions of the works. -- Kie ekzistas vivo, ekzistas espero. _______________________________________________ Wikisource-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikisource-l
