Re: [Wikisource-l] OCR for Persian

Amir Ladsgroup Wed, 25 Jun 2014 01:07:09 -0700

I tried ABBY before and the quality was low,
I will try tesseract and see what happens


Best


On Tue, Jun 24, 2014 at 7:08 PM, Aleksey Chalabyan <[email protected]>
wrote:

> ABBYY FineReader supports Hebrew and Arabic since v. 11. But I'm afraid
> same script is not enough. For example FineReader has 3 versions for
> Armenian. All three use same scripts, different orphography and slightly
> different vocabulary, but if you set wrong language drop in quality is
> dramatic. So I'm not sure if Arabic OCR would work good for text in Farsi
> (Persian).
> FineReader provides 30 days full trial, and I think it's worth to give it
> a try.
>
> You may try to approach ABBYY and check if there are any plans on full
> support of Persian in coming future.
>
> And trying to train Teseract seems like good idea to get free/open source
> OCR for Persian, if you can get enough resources on that. But I can't
> comment on how well it will work with RTL scripts especially with
> Nastaliq/Naskh when letters and words are not separated from each other.
>
>
> On Tue, Jun 24, 2014 at 6:13 PM, Federico Leva (Nemo) <[email protected]>
> wrote:
>
>> Amir Ladsgroup, 24/06/2014 15:37:
>>
>>  I have access to huge resources of old books in Persian (some of them
>>> are even typed) and almost all of them can be imported to Wikisource but
>>> the problem is I don't have (or know) any OCR for Persian, Do you know
>>> which OCR software supports Persian (supporting Arabic is not enough; I
>>> checked several programs) texts?
>>>
>>
>> The only result for "Persian" and OCR in abbyy website is <
>> http://www.abbyy.com/CaseStudies/SISU-Reveals-Its-
>> Multilingual-Content-to-Academic-Community-Thanks-to-
>> ABBYY-Recognition-Server/>, weird! Worth asking them some details, they
>> might have some additional plugins.
>>
>> On the FLOSS side, maybe some library in Iran made some investments on
>> tesseract? If there's any big digital library of Persian content you should
>> ask them as well.
>>
>> Reminder: archive.org is still in need of people willing to compare 8.0
>> vs. 9.0 OCR results of some books in their language. :)
>> http://thread.gmane.org/gmane.org.wikimedia.wikisource/1552
>>
>> Nemo
>>
>> _______________________________________________
>> Wikisource-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>>
>
>
> _______________________________________________
> Wikisource-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>


-- 
Amir

_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Re: [Wikisource-l] OCR for Persian

Reply via email to