It looks to me like you'll have to train tesseract. Training for connected
scripts is much harder, and there are some tricks with right to left, but
you might want to try tesseract with Arabic language data and see how good
the accuracy is. It might be possible to fix the errors with
post-processing of the results (pattern matching). There has also been a
project to recognize Farsi, and that training data might give even better
results.
--Sven


On Sun, Jan 5, 2014 at 3:39 PM, Meena Rajani <[email protected]> wrote:

> Hi
>
>
>  I am new to OCR and Tesseract. I have to work with Sindhi script which
> is little different than Arabic. Arabic is supported by Tesseract. While I
> am not sure if Sindhi script image file can be  processed with Tesseract. I
> have attached Sindhi alphabet. It has 52 characters some are same as Arabic
> and some are like Persian and few more.  My question is, are these
> sindhi characters recognized by the OCR? if not then
> what shall I do so that tesseract can recognise the characters. Do I just
> need to train Tesseract on the new characters or do I need to extend
> tessearct API??
>
> Please find the image of Sindhi alphabate attached.
>
>
> Thanks
>
> Meena
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to