Hi,
Can we retrain tesseract by removing all the unwanted symbols and
characters for English language.
If so can someone share how to do so please.


Thanks,
Purushotham


On Thu, Oct 31, 2019, 5:57 PM 'Yuliana Zigangirova' via tesseract-ocr <
[email protected]> wrote:

> Hi everyone,
>
> I am trying to train Tesseract for some funny looking fonts, like Palace
> for example.
> I have tried a simple way  -  produced traindata with
> http://trainyourtesseract.com/
> and then have made a call like
>
> api->Init(".\\tessdata", "eng+Palace",OEM_TESSERACT_ONLY).
> api->SetPageSegMode(PSM_SINGLE_LINE);
> api->SetImage(image);
>  // Get OCR result
>  outText = api->GetUTF8Text();
>
> The result for a line like
>
> M P S T a o e h i l n p r s t u w y
>
> is below, no glyph is correctly recognized:
>
> .MDXXXo,XkX.n.mX.XnoX
>
> Does trainyourtesseract make bad traineddata or do I make wrong calls,
> and how does one handle such cases?
>
> Actualle, I have tried the same with less funny fonts,
> but also the recognition almost does not improve.
>
> I am attaching the tiff file  and my trained data for Palace.
>
> Thank you everyone in advance for help,
> Yuliana
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/1b574457-5418-46f7-93fb-f2849b232f10%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/1b574457-5418-46f7-93fb-f2849b232f10%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAHrKmrXT5P9GZsfXPO%3DWanf1m7gU0KvmN%2BLvcKkCazGf7UwZ6Q%40mail.gmail.com.

Reply via email to