The steps I outlined will produce ONE hocr file in the end containing all your pages. Did you try it out? Afterwards you can for example use something like hocr-eval-lines see https://github.com/tmbdev/hocr-tools#hocr-eval-lines for comparison.
BTW you can find some of the old versions linked here https://github.com/tmbdev/ocropy/wiki/Older-versions , but I don't think you have to use 7 years old versions for your task. 2017-02-22 14:43 GMT+01:00 Pedro Correia <[email protected]>: > Dear Philipp, > as I said, I can't afford to convert the multipage into several images, > due to my groundtruth that is a singles txt file. There's really no way to > apply ocropus to a multipage? > > Check Tom's post on 0.4.1 release: https://groups. > google.com/forum/?utm_medium=email&utm_source=footer#! > searchin/ocropus/multipage|sort:relevance/ocropus/KDV0sa8FUOU/y5-1eXo07roJ > > Here, Tom refer to the "Subversion version of ocropus", which could > supposedly work on multipages: https://groups. > google.com/forum/?utm_medium=email&utm_source=footer#! > searchin/ocropus/multipage|sort:relevance/ocropus/OcvP0Z2tFj4/IW_3Wt3WFpoJ > However, I couldn't find it, in order to download it. > > > Em quinta-feira, 16 de fevereiro de 2017 17:37:30 UTC-2, Philipp Zumstein > escreveu: >> >> I think this option is not yet supported anymore. BTW where did you read >> that? >> >> However, it should be possible to achieve your goals with commands like >> these: >> >> convert multipage.tiff page.png >> ocropus-nlbin page*.png >> ocropus-gpageseg page*.bin.png >> ocropus-rpred page*/*.bin.png >> ocropus-hocr page*/*.txt >> >> >> >> 2017-02-16 17:20 GMT+01:00 Pedro Correia <[email protected]>: >> >>> PS: I can't afford to split the multipage tiff into several tiff files, >>> because my groundtruth is a single txt file. >>> >>> Em quinta-feira, 16 de fevereiro de 2017 14:18:39 UTC-2, Pedro Correia >>> escreveu: >>>> >>>> Hi there, I've read that multipage tiff support is available since v >>>> 0.4.1. >>>> Currently, I need OCRopus to run on a multipage TIFF (a book) and >>>> output a single hocr containing the whole book's text. However, I've >>>> noticed that when I run it, the output provided is the OCR of the first >>>> page only, the others are simply ignored. >>>> Is there any argument or something that I can use in order to tell >>>> OCRopus that the input is a multipage TIFF and not a regular TIFF file? >>>> Thanks in advance, >>>> Pedro >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "ocropus" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/ocropus/ba79be37-9332-4ce8-b6f8-16821bd47e32%40googlegroups.com >>> <https://groups.google.com/d/msgid/ocropus/ba79be37-9332-4ce8-b6f8-16821bd47e32%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "ocropus" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/ocropus/c3fad7e9-4a25-483b-a9a8-0d95ed7677c5%40googlegroups.com > <https://groups.google.com/d/msgid/ocropus/c3fad7e9-4a25-483b-a9a8-0d95ed7677c5%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "ocropus" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ocropus/CAAjpKCSS24_5W5bnFygxxLDUuhcK3aOoFuZTTwiSASdMfJ8b3w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
