Checkout output to hocr (which is html output), tsv or pdf. See doc.

Zdenko


št 19. 3. 2020 o 8:04 Dayton <[email protected]> napísal(a):

> Hi All,
>
> I´m using Tesseract for Windows to OCR scanned documents and then format
> the layout in Word in a later stage.
>
> The text extraction that I get in the .TXT output does not add any hard
> return or any separation between paragraphs, so I have to spend many time
> to guess where are the end of each line.
>
> Is there any way to add a parameter in the line code to add separations
> between paragraphs?
>
> Should I use another output format instead of TXT in order to make easier
> the formatting in Word?
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/a6e27031-89a2-4800-a574-48f738b439a0%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/a6e27031-89a2-4800-a574-48f738b439a0%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8z9HduoUHUF9qdpPV7zLB47bcLsKPKONsKgAMu3GfpW-w%40mail.gmail.com.

Reply via email to