The ResultIterator has a method IsAtBeginningOf which can be used for this.
Try putting this code just at the start of the loop:
if (ri->IsAtBeginningOf(tesseract::RIL_WORD)) { printf("======= Start
word\n");}
John
On Thursday, 8 October 2015 10:33:23 UTC+1, RK wrote:
>
> Hi,
>
> I have this code which gives me confidence level of each character in a
> word. Now if I have multiple words in a image it gives me confidence level
> of each character.
>
> But the problem is it prints all in a sequence (It is not taking empty
> space ) how do I identify that a word ends here and the other word's
> character probability starts. Any help on this ???
>
>
> For example: If my image has Book Now (refer attached image) My current
> output is as follows. Now I want to introduce a delimiter after symbol "K"
> then how should I do that. So that I will know that till Symbol K it is one
> word and after that it is another word.
>
> sample output:
>
> symbol B, conf: 95.727936 - B conf: 95.727936
> - 3 conf: 83.558624
> - E conf: 81.664284
> ---------------------------------------------
> symbol O, conf: 90.067154 - O conf: 90.067154
> - 0 conf: 87.427773
> - Q conf: 83.844460
> - C conf: 82.962616
> - G conf: 79.682472
> ---------------------------------------------
> symbol O, conf: 90.468826 - O conf: 90.468826
> - 0 conf: 87.815132
> - C conf: 86.248314
> - Q conf: 82.877472
> ---------------------------------------------
> symbol K, conf: 93.121216 - K conf: 93.121216
> ---------------------------------------------
> symbol N, conf: 91.598183 - N conf: 91.598183
> ---------------------------------------------
> symbol O, conf: 89.931847 - O conf: 89.931847
> - 0 conf: 87.237823
> - Q conf: 84.576927
> - C conf: 82.600273
> - G conf: 80.553169
> - D conf: 79.337044
> ---------------------------------------------
> symbol W, conf: 96.001007 - W conf: 96.001007
> - w conf: 86.990593
> ---------------------------------------------
>
>
> #include <tesseract/baseapi.h>
> #include <leptonica/allheaders.h>
> #include </usr/local/include/tesseract/pageiterator.h>
> #include </usr/local/include/tesseract/resultiterator.h>
> #include <iostream>
> int main()
> {
> Pix *image = pixRead("sample.png");
> tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
> api->Init(NULL, "eng");
> api->SetImage(image);
> api->SetVariable("save_blob_choices", "T");
> api->SetRectangle(37, 228, 548, 31);
> api->Recognize(NULL);
>
> tesseract::ResultIterator* ri = api->GetIterator();
> tesseract::PageIteratorLevel level = tesseract::RIL_SYMBOL;
> if(ri != 0) {
> do {
> const char* symbol = ri->GetUTF8Text(level);
> float conf = ri->Confidence(level);
> if(symbol != 0) {
> printf("symbol %s, conf: %f", symbol, conf);
> bool indent = false;
> tesseract::ChoiceIterator ci(*ri);
> do {
> if (indent) printf("\t\t ");
> printf("\t- ");
> const char* choice = ci.GetUTF8Text();
> printf("%s conf: %f\n", choice, ci.Confidence());
> indent = true;
> } while(ci.Next());
> }
> printf("$\n");
> delete[] symbol;
> } while((ri->Next(level)));
> }
>
>
> }
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/ac6dad1a-ee41-447f-9849-ff95c530f9ab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.