Yes, any info would be very useful. I've tried modifying a large number of config variables to no effect with Tesseract 4.0+. Having some control over line/word/character segmentation would be a very useful feature.
On Tue, Aug 27, 2019 at 5:11 AM Stephane Charette < [email protected]> wrote: > I joined for similar/opposite reasons: In my case Tesseract is removing > critical whitespace from between non-dictionary words, and I was looking > for tips/hints as to what to tweak in Tesseract's configuration to get it > to treat whitespace differently. > > Anyone know? > > Stéphane > > > On Wednesday, July 10, 2019 at 8:16:55 AM UTC-7, Timothy Snyder wrote: >> >> Hello all, >> >> Does anyone know of any config parameters that will increase the >> tolerance of whitespace between characters, i.e., increase the amount of >> whitespace needed to trigger word segmentation? >> >> I have many cases in my text where there are extra whitespace between >> characters resulting in the segmentation of a single word into multiple >> words. >> >> Any suggestions would be appreciated! >> >> -Tim >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/8e6878a9-655d-41c7-9d4d-bcb7dcfb6419%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/8e6878a9-655d-41c7-9d4d-bcb7dcfb6419%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CABtjQ9Kh8B3b0grMJgfTgJJqS3H3tpZ7FP_%2B-odNB6pwcJZ0KQ%40mail.gmail.com.

