The steps I outlined will produce ONE hocr file in the end containing all
your pages. Did you try it out? Afterwards you can for example use
something like hocr-eval-lines see
https://github.com/tmbdev/hocr-tools#hocr-eval-lines for comparison.

BTW you can find some of the old versions linked here
https://github.com/tmbdev/ocropy/wiki/Older-versions , but I don't think
you have to use 7 years old versions for your task.

2017-02-22 14:43 GMT+01:00 Pedro Correia <[email protected]>:

> Dear Philipp,
> as I said, I can't afford to convert the multipage into several images,
> due to my groundtruth that is a singles txt file. There's really no way to
> apply ocropus to a multipage?
>
> Check Tom's post on 0.4.1 release: https://groups.
> google.com/forum/?utm_medium=email&utm_source=footer#!
> searchin/ocropus/multipage|sort:relevance/ocropus/KDV0sa8FUOU/y5-1eXo07roJ
>
> Here, Tom refer to the "Subversion version of ocropus", which could
> supposedly work on multipages: https://groups.
> google.com/forum/?utm_medium=email&utm_source=footer#!
> searchin/ocropus/multipage|sort:relevance/ocropus/OcvP0Z2tFj4/IW_3Wt3WFpoJ
> However, I couldn't find it, in order to download it.
>
>
> Em quinta-feira, 16 de fevereiro de 2017 17:37:30 UTC-2, Philipp Zumstein
> escreveu:
>>
>> I think this option is not yet supported anymore. BTW where did you read
>> that?
>>
>> However, it should be possible to achieve your goals with commands like
>> these:
>>
>> convert multipage.tiff page.png
>> ocropus-nlbin page*.png
>> ocropus-gpageseg page*.bin.png
>> ocropus-rpred page*/*.bin.png
>> ocropus-hocr page*/*.txt
>>
>>
>>
>> 2017-02-16 17:20 GMT+01:00 Pedro Correia <[email protected]>:
>>
>>> PS: I can't afford to split the multipage tiff into several tiff files,
>>> because my groundtruth is a single txt file.
>>>
>>> Em quinta-feira, 16 de fevereiro de 2017 14:18:39 UTC-2, Pedro Correia
>>> escreveu:
>>>>
>>>> Hi there, I've read that multipage tiff support is available since v
>>>> 0.4.1.
>>>> Currently, I need OCRopus to run on a multipage TIFF (a book) and
>>>> output a single hocr containing the whole book's text. However, I've
>>>> noticed that when I run it, the output provided is the OCR of the first
>>>> page only, the others are simply ignored.
>>>> Is there any argument or something that I can use in order to tell
>>>> OCRopus that the input is a multipage TIFF and not a regular TIFF file?
>>>> Thanks in advance,
>>>> Pedro
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "ocropus" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/ocropus/ba79be37-9332-4ce8-b6f8-16821bd47e32%40googlegroups.com
>>> <https://groups.google.com/d/msgid/ocropus/ba79be37-9332-4ce8-b6f8-16821bd47e32%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "ocropus" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/ocropus/c3fad7e9-4a25-483b-a9a8-0d95ed7677c5%40googlegroups.com
> <https://groups.google.com/d/msgid/ocropus/c3fad7e9-4a25-483b-a9a8-0d95ed7677c5%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ocropus/CAAjpKCSS24_5W5bnFygxxLDUuhcK3aOoFuZTTwiSASdMfJ8b3w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to