Dear Sir Srivas ji,
firstly, you should not have sent 2.2 MB 68 page pdf file and 181 KB zip
to all the list members unasked. You could have loaded it somewhere and
sent the link so that only those download it who can contribute in it.
It is a wastage of time and bandwidth to get such huge messages.
Secondly, I couldn't really understand your issue. I saw your pdf file.
it is pure English. You can open it in any pdf reader and just copy
entire text from there and paste in a text or word file. So, what else
exactly you are looking for, please elaborate.
you don't even need to ocr it. These are already ASCII text.
Thanks.
--
Rawat
On 11/26/2013 12:40 PM, Srivas wrote:
Hi!
I have a bunch of PDF files journals and I need to get the text out of
it. They contain a lot of romanized sanskrit diacritical marks and that
creates a difficulty. I tried Finereader and OmniPage but they cannot be
trained to recognize those symbols. I just need an ORC program I can
train to show any symbol required and the above programs cannot do that.
Where should I start from? I feel like this program can do the job but
can you help me to get started? I downloaded tesseract and installed it
(windows). There are different GUIs available and I think it will make
it easier to work. Can you suggest a good one? I tried gimagereader but
it's too primitive and leaves a lot of work to be done afterwards with
the overall text.
I don't think this kind of language pack is available and how to create it?
I will add one pdf and fonts that were used to create it. Maybe someone
would like to try and let me know how to do it?
Thank you for any help!
Regards,
Srivas
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.