Yes, any info would be very useful. I've tried modifying a large number of
config variables to no effect with Tesseract 4.0+. Having some control over
line/word/character segmentation would be a very useful feature.

On Tue, Aug 27, 2019 at 5:11 AM Stephane Charette <
[email protected]> wrote:

> I joined for similar/opposite reasons:  In my case Tesseract is removing
> critical whitespace from between non-dictionary words, and I was looking
> for tips/hints as to what to tweak in Tesseract's configuration to get it
> to treat whitespace differently.
>
> Anyone know?
>
> Stéphane
>
>
> On Wednesday, July 10, 2019 at 8:16:55 AM UTC-7, Timothy Snyder wrote:
>>
>> Hello all,
>>
>> Does anyone know of any config parameters that will increase the
>> tolerance of whitespace between characters, i.e., increase the amount of
>> whitespace needed to trigger word segmentation?
>>
>> I have many cases in my text where there are extra whitespace between
>> characters resulting in the segmentation of a single word into multiple
>> words.
>>
>> Any suggestions would be appreciated!
>>
>> -Tim
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/8e6878a9-655d-41c7-9d4d-bcb7dcfb6419%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/8e6878a9-655d-41c7-9d4d-bcb7dcfb6419%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CABtjQ9Kh8B3b0grMJgfTgJJqS3H3tpZ7FP_%2B-odNB6pwcJZ0KQ%40mail.gmail.com.

Reply via email to