Checkout output to hocr (which is html output), tsv or pdf. See doc. Zdenko
št 19. 3. 2020 o 8:04 Dayton <[email protected]> napísal(a): > Hi All, > > I´m using Tesseract for Windows to OCR scanned documents and then format > the layout in Word in a later stage. > > The text extraction that I get in the .TXT output does not add any hard > return or any separation between paragraphs, so I have to spend many time > to guess where are the end of each line. > > Is there any way to add a parameter in the line code to add separations > between paragraphs? > > Should I use another output format instead of TXT in order to make easier > the formatting in Word? > > Thanks! > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/a6e27031-89a2-4800-a574-48f738b439a0%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/a6e27031-89a2-4800-a574-48f738b439a0%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8z9HduoUHUF9qdpPV7zLB47bcLsKPKONsKgAMu3GfpW-w%40mail.gmail.com.

