Re: Need help in recognizing english texts with sanskrit roman diacritical marks.

V S Rawat Wed, 27 Nov 2013 02:00:51 -0800

I am also not able to get pdf file read directly in vietocr. sameghostscript problem is appearing.

If you want a way around, use some pdf reader (xchange pdf reader) to"export" all pages to save as image tif files, keep them in one folder,use vietocr's batch processing option, it will ocr all images in thatfolder and create a corresponding txt file having the text, you thencombine all files to a single text one and bring that to ms word.


That is the long route I have been using as an alternative.

Thanks.
--
Rawat

On 11/27/2013 9:20 AM, Srivas wrote:

Thanks, I almost got my problem solved but I also want to try this out.
I'm quite sure I will need it also since I have some scanned vedic texts
and I would like to get them recognized also.

I'm encountering the following problem: After installing the VietORC and
trying to open a pdf file, the following error comes up: The gsdll32.dll
wasn't found in default DLL search path. Please install GPL Ghostscript
and/or set the appropriate environment variable.

I did download and install Ghostscript but the error remains. What to do
next?

On Tuesday, November 26, 2013 6:53:03 PM UTC+7, shree wrote:

For GUI
you can try VietOCR -
http://sourceforge.net/projects/vietocr/files/vietocr/
<http://sourceforge.net/projects/vietocr/files/vietocr/>

For Language data for sanskrit transliteration
Try

http://sourceforge.net/projects/tesseracthindi/files/Tesseract-3-02-SanskritTransliteration/

<http://sourceforge.net/projects/tesseracthindi/files/Tesseract-3-02-SanskritTransliteration/>

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Nov 26, 2013 at 12:40 PM, Srivas <[email protected]
<javascript:>> wrote:

Hi!
I have a bunch of PDF files journals and I need to get the text
out of it. They contain a lot of romanized sanskrit diacritical
marks and that creates a difficulty. I tried Finereader and
OmniPage but they cannot be trained to recognize those symbols.
I just need an ORC program I can train to show any symbol
required and the above programs cannot do that.

Where should I start from? I feel like this program can do the
job but can you help me to get started? I downloaded tesseract
and installed it (windows). There are different GUIs available
and I think it will make it easier to work. Can you suggest a
good one? I tried gimagereader but it's too primitive and leaves
a lot of work to be done afterwards with the overall text.

I don't think this kind of language pack is available and how to
create it?

I will add one pdf and fonts that were used to create it. Maybe
someone would like to try and let me know how to do it?

Thank you for any help!

Regards,
Srivas


--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

---You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Need help in recognizing english texts with sanskrit roman diacritical marks.

Reply via email to