Hi Eric,

Thanks for this posting. Out of curiousity why do you need to
preserve multiple spaces?

Do you think you could update the code to allow a new configuration
variable? If you did, and posted the patch to the issues page, I
expect it would be accepted, as this sounds like the sort of thing
that is useful to be able to do.

Nick

On Sat, Jun 08, 2013 at 01:50:11PM -0700, [email protected] wrote:
> I found the code Ray referred to back in '09. It is now in GetUTF8Text(). In
> baseapi.cpp in TessBaseAPI::GetUTF8Text I changed:
> 
>     *ptr++ = ' ';
> 
> to
> 
>     {
>       int i ;
>       for ( i = 0 ; i < word->word->space() ; i++ )
>         *ptr++ = ' ';
>     }
> 
> This added back in the multiple spaces as advertised. The results are a bit
> unpredictable (as Ray warned back in '09).
> 
> I'll keep poking at it.
> 
> Eric
>      
> 
> On Saturday, June 8, 2013 10:37:20 AM UTC-4, [email protected] wrote:
> 
>     I need to maintain the (multiple) spaces in my output document. About 5
>     years ago someone asked how to do this and Ray posted a suggestion. That
>     suggestion does not appear to correspond to the current source code.
> 
>     Can anyone suggest how I can maintain word spacing both before the first
>     word on a line (indentation) as well as between words within a line?
> 
>     I can force the text in the input image to have fixed spacing.
> 
>     Ideally, there is a command line switch or a config item that will do what
>     I need, but I am not averse to modifying the code if necessary.
> 
>     Thanks,
>     Eric
> 
> 
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>  
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email
> to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to