I am also not able to get pdf file read directly in vietocr. same ghostscript problem is appearing.

If you want a way around, use some pdf reader (xchange pdf reader) to "export" all pages to save as image tif files, keep them in one folder, use vietocr's batch processing option, it will ocr all images in that folder and create a corresponding txt file having the text, you then combine all files to a single text one and bring that to ms word.

That is the long route I have been using as an alternative.

Thanks.
--
Rawat

On 11/27/2013 9:20 AM, Srivas wrote:
Thanks, I almost got my problem solved but I also want to try this out.
I'm quite sure I will need it also since I have some scanned vedic texts
and I would like to get them recognized also.

I'm encountering the following problem: After installing the VietORC and
trying to open a pdf file, the following error comes up: The gsdll32.dll
wasn't found in default DLL search path. Please install GPL Ghostscript
and/or set the appropriate environment variable.

I did download and install Ghostscript but the error remains. What to do
next?

On Tuesday, November 26, 2013 6:53:03 PM UTC+7, shree wrote:

    For GUI
    you can try VietOCR -
    http://sourceforge.net/projects/vietocr/files/vietocr/
    <http://sourceforge.net/projects/vietocr/files/vietocr/>

    For Language data for sanskrit transliteration
    Try
    
http://sourceforge.net/projects/tesseracthindi/files/Tesseract-3-02-SanskritTransliteration/
    
<http://sourceforge.net/projects/tesseracthindi/files/Tesseract-3-02-SanskritTransliteration/>




    Shree Devi Kumar
    ____________________________________________________________
    भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


    On Tue, Nov 26, 2013 at 12:40 PM, Srivas <[email protected]
    <javascript:>> wrote:

        Hi!
        I have a bunch of PDF files journals and I need to get the text
        out of it. They contain a lot of romanized sanskrit diacritical
        marks and that creates a difficulty. I tried Finereader and
        OmniPage but they cannot be trained to recognize those symbols.
        I just need an ORC program I can train to show any symbol
        required and the above programs cannot do that.

        Where should I start from? I feel like this program can do the
        job but can you help me to get started? I downloaded tesseract
        and installed it (windows). There are different GUIs available
        and I think it will make it easier to work. Can you suggest a
        good one? I tried gimagereader but it's too primitive and leaves
        a lot of work to be done afterwards with the overall text.

        I don't think this kind of language pack is available and how to
        create it?

        I will add one pdf and fonts that were used to create it. Maybe
        someone would like to try and let me know how to do it?

        Thank you for any help!

        Regards,
        Srivas

--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to