Same problem in tesseract 4.1.0. Removing whitespaces. I've tried with a lot of parameters.
El martes, 27 de agosto de 2019, 15:14:27 (UTC+2), Timothy Snyder escribió: > > Yes, any info would be very useful. I've tried modifying a large number of > config variables to no effect with Tesseract 4.0+. Having some control over > line/word/character segmentation would be a very useful feature. > > On Tue, Aug 27, 2019 at 5:11 AM Stephane Charette <[email protected] > <javascript:>> wrote: > >> I joined for similar/opposite reasons: In my case Tesseract is removing >> critical whitespace from between non-dictionary words, and I was looking >> for tips/hints as to what to tweak in Tesseract's configuration to get it >> to treat whitespace differently. >> >> Anyone know? >> >> Stéphane >> >> >> On Wednesday, July 10, 2019 at 8:16:55 AM UTC-7, Timothy Snyder wrote: >>> >>> Hello all, >>> >>> Does anyone know of any config parameters that will increase the >>> tolerance of whitespace between characters, i.e., increase the amount of >>> whitespace needed to trigger word segmentation? >>> >>> I have many cases in my text where there are extra whitespace between >>> characters resulting in the segmentation of a single word into multiple >>> words. >>> >>> Any suggestions would be appreciated! >>> >>> -Tim >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/8e6878a9-655d-41c7-9d4d-bcb7dcfb6419%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/8e6878a9-655d-41c7-9d4d-bcb7dcfb6419%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9af5877d-908c-457f-8f7b-318dcf3e8ecc%40googlegroups.com.

