Hi, thanks for your answer, but where can I find the baseapi.cpp file ? On Monday, June 10, 2013 at 2:37:28 PM UTC+2, Nick White wrote: > > Hi Eric, > > Thanks for this posting. Out of curiousity why do you need to > preserve multiple spaces? > > Do you think you could update the code to allow a new configuration > variable? If you did, and posted the patch to the issues page, I > expect it would be accepted, as this sounds like the sort of thing > that is useful to be able to do. > > Nick > > On Sat, Jun 08, 2013 at 01:50:11PM -0700, [email protected] <javascript:> > wrote: > > I found the code Ray referred to back in '09. It is now in > GetUTF8Text(). In > > baseapi.cpp in TessBaseAPI::GetUTF8Text I changed: > > > > *ptr++ = ' '; > > > > to > > > > { > > int i ; > > for ( i = 0 ; i < word->word->space() ; i++ ) > > *ptr++ = ' '; > > } > > > > This added back in the multiple spaces as advertised. The results are a > bit > > unpredictable (as Ray warned back in '09). > > > > I'll keep poking at it. > > > > Eric > > > > > > On Saturday, June 8, 2013 10:37:20 AM UTC-4, [email protected] wrote: > > > > I need to maintain the (multiple) spaces in my output document. > About 5 > > years ago someone asked how to do this and Ray posted a suggestion. > That > > suggestion does not appear to correspond to the current source code. > > > > Can anyone suggest how I can maintain word spacing both before the > first > > word on a line (indentation) as well as between words within a line? > > > > I can force the text in the input image to have fixed spacing. > > > > Ideally, there is a command line switch or a config item that will > do what > > I need, but I am not averse to modifying the code if necessary. > > > > Thanks, > > Eric > > > > > > -- > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > <javascript:> > > To unsubscribe from this group, send email to > > [email protected] <javascript:> > > For more options, visit this group at > > http://groups.google.com/group/tesseract-ocr?hl=en > > > > --- > > You received this message because you are subscribed to the Google > Groups > > "tesseract-ocr" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email > > to [email protected] <javascript:>. > > For more options, visit https://groups.google.com/groups/opt_out. > > > > >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1e86187c-0e74-4f50-9b12-fc67321acd43%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

