Hi, thanks for your answer, but where can I find the baseapi.cpp file ?

On Monday, June 10, 2013 at 2:37:28 PM UTC+2, Nick White wrote:
>
> Hi Eric, 
>
> Thanks for this posting. Out of curiousity why do you need to 
> preserve multiple spaces? 
>
> Do you think you could update the code to allow a new configuration 
> variable? If you did, and posted the patch to the issues page, I 
> expect it would be accepted, as this sounds like the sort of thing 
> that is useful to be able to do. 
>
> Nick 
>
> On Sat, Jun 08, 2013 at 01:50:11PM -0700, [email protected] <javascript:> 
> wrote: 
> > I found the code Ray referred to back in '09. It is now in 
> GetUTF8Text(). In 
> > baseapi.cpp in TessBaseAPI::GetUTF8Text I changed: 
> > 
> >     *ptr++ = ' '; 
> > 
> > to 
> > 
> >     { 
> >       int i ; 
> >       for ( i = 0 ; i < word->word->space() ; i++ ) 
> >         *ptr++ = ' '; 
> >     } 
> > 
> > This added back in the multiple spaces as advertised. The results are a 
> bit 
> > unpredictable (as Ray warned back in '09). 
> > 
> > I'll keep poking at it. 
> > 
> > Eric 
> >       
> > 
> > On Saturday, June 8, 2013 10:37:20 AM UTC-4, [email protected] wrote: 
> > 
> >     I need to maintain the (multiple) spaces in my output document. 
> About 5 
> >     years ago someone asked how to do this and Ray posted a suggestion. 
> That 
> >     suggestion does not appear to correspond to the current source code. 
> > 
> >     Can anyone suggest how I can maintain word spacing both before the 
> first 
> >     word on a line (indentation) as well as between words within a line? 
> > 
> >     I can force the text in the input image to have fixed spacing. 
> > 
> >     Ideally, there is a command line switch or a config item that will 
> do what 
> >     I need, but I am not averse to modifying the code if necessary. 
> > 
> >     Thanks, 
> >     Eric 
> > 
> > 
> > -- 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "tesseract-ocr" group. 
> > To post to this group, send email to [email protected] 
> <javascript:> 
> > To unsubscribe from this group, send email to 
> > [email protected] <javascript:> 
> > For more options, visit this group at 
> > http://groups.google.com/group/tesseract-ocr?hl=en 
> >   
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "tesseract-ocr" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email 
> > to [email protected] <javascript:>. 
> > For more options, visit https://groups.google.com/groups/opt_out. 
> >   
> >   
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1e86187c-0e74-4f50-9b12-fc67321acd43%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to