The getuf8text function has been changed in baseapi.cpp as below:
/** Make a text string from the internal data structures. */
char* TessBaseAPI::GetUTF8Text() {
if (tesseract_ == NULL ||
(!recognition_done_ && Recognize(NULL) < 0))
return NULL;
STRING text("");
ResultIterator *it = GetIterator();
do {
if (it->Empty(RIL_PARA)) continue;
const std::unique_ptr<const char[]>
para_text(it->GetUTF8Text(RIL_PARA));
text += para_text.get();
} while (it->Next(RIL_PARA));
char* result = new char[text.length() + 1];
strncpy(result, text.string(), text.length() + 1);
delete it;
return result;
}
So, there's no **ptr++=' ' to replace. Would be great if anyone can tell me
how to go about this problem.
On Friday, October 26, 2018 at 7:47:43 PM UTC+5:30,
[email protected] wrote:
>
> Hi, thanks for your answer, but where can I find the baseapi.cpp file ?
>
> On Monday, June 10, 2013 at 2:37:28 PM UTC+2, Nick White wrote:
>>
>> Hi Eric,
>>
>> Thanks for this posting. Out of curiousity why do you need to
>> preserve multiple spaces?
>>
>> Do you think you could update the code to allow a new configuration
>> variable? If you did, and posted the patch to the issues page, I
>> expect it would be accepted, as this sounds like the sort of thing
>> that is useful to be able to do.
>>
>> Nick
>>
>> On Sat, Jun 08, 2013 at 01:50:11PM -0700, [email protected] wrote:
>> > I found the code Ray referred to back in '09. It is now in
>> GetUTF8Text(). In
>> > baseapi.cpp in TessBaseAPI::GetUTF8Text I changed:
>> >
>> > *ptr++ = ' ';
>> >
>> > to
>> >
>> > {
>> > int i ;
>> > for ( i = 0 ; i < word->word->space() ; i++ )
>> > *ptr++ = ' ';
>> > }
>> >
>> > This added back in the multiple spaces as advertised. The results are a
>> bit
>> > unpredictable (as Ray warned back in '09).
>> >
>> > I'll keep poking at it.
>> >
>> > Eric
>> >
>> >
>> > On Saturday, June 8, 2013 10:37:20 AM UTC-4, [email protected] wrote:
>> >
>> > I need to maintain the (multiple) spaces in my output document.
>> About 5
>> > years ago someone asked how to do this and Ray posted a suggestion.
>> That
>> > suggestion does not appear to correspond to the current source
>> code.
>> >
>> > Can anyone suggest how I can maintain word spacing both before the
>> first
>> > word on a line (indentation) as well as between words within a
>> line?
>> >
>> > I can force the text in the input image to have fixed spacing.
>> >
>> > Ideally, there is a command line switch or a config item that will
>> do what
>> > I need, but I am not averse to modifying the code if necessary.
>> >
>> > Thanks,
>> > Eric
>> >
>> >
>> > --
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "tesseract-ocr" group.
>> > To post to this group, send email to [email protected]
>> > To unsubscribe from this group, send email to
>> > [email protected]
>> > For more options, visit this group at
>> > http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>> > ---
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "tesseract-ocr" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email
>> > to [email protected].
>> > For more options, visit https://groups.google.com/groups/opt_out.
>> >
>> >
>>
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/84195bd3-d984-4155-8511-fb86c01914f3%40googlegroups.com.