I found the code Ray referred to back in '09. It is now in GetUTF8Text().
In baseapi.cpp in TessBaseAPI::GetUTF8Text I changed:
*ptr++ = ' ';
to
{
int i ;
for ( i = 0 ; i < word->word->space() ; i++ )
*ptr++ = ' ';
}
This added back in the multiple spaces as advertised. The results are a bit
unpredictable (as Ray warned back in '09).
I'll keep poking at it.
Eric
On Saturday, June 8, 2013 10:37:20 AM UTC-4, [email protected] wrote:
>
> I need to maintain the (multiple) spaces in my output document. About 5
> years ago someone asked how to do this and Ray posted a suggestion. That
> suggestion does not appear to correspond to the current source code.
>
> Can anyone suggest how I can maintain word spacing both before the first
> word on a line (indentation) as well as between words within a line?
>
> I can force the text in the input image to have fixed spacing.
>
> Ideally, there is a command line switch or a config item that will do what
> I need, but I am not averse to modifying the code if necessary.
>
> Thanks,
> Eric
>
>
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.