Re: [tesseract-ocr] Getting started with tesseract-ocr in a web app.

2019-09-15 Thread Clint William Theron
> > Thanks @Lorenzo for making it clear that tesseract is not a server but a >> library and for the link. I still want to know how to start calling >> tesseract. In know this is not a place to ask about gitpot.io but do I need >> to set the environment variable because in gitpot.io is a

[tesseract-ocr] Re: Replacing contrast-enhanced image in PDF with low-contrast original , post-Tesseract

2019-09-15 Thread IGM
I'm on ubuntu 18.04 and Tesseract 4 now, and can confirm the *-c textonly_pdf=1* hack works. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: [tesseract-ocr] OCR of Devanagari + Diacritics + English

2019-09-15 Thread Ravi Annaswamy
Ok thanks for the share very nice interface As Alex highlights tesseract allows further customization I myself want to learn how to train tesseract for Tamil and Sanskrit using your scripts and guides but haven’t got a good starting point yet Sent from my iPhone > On Sep 15, 2019, at 12:55 PM,

Re: [tesseract-ocr] OCR of Devanagari + Diacritics + English

2019-09-15 Thread Shree Devi Kumar
Don't know the details. On Sun, Sep 15, 2019, 21:36 Ravi Annaswamy wrote: > That is a beautiful app. > > Shree Devi Kumar, what service does the 'google' selection hit? Is it free? > > Ravi > > > On Sun, Sep 15, 2019 at 11:34 AM Shree Devi Kumar > wrote: > >> Try

Re: [tesseract-ocr] OCR of Devanagari + Diacritics + English

2019-09-15 Thread Alexander Gribanov
Yeah, looks great, but still there are some mistakes by Google... I'm not sure, as long as I heard, Tesseract could be trained, but I never heard about that Google service, so not sure, is it possible to reduce such problems via some more trainings? I mean, does the Google service is trainable

Re: [tesseract-ocr] OCR of Devanagari + Diacritics + English

2019-09-15 Thread Ravi Annaswamy
I split the pages to left right pages and posted on the ocr with google option and here are the results, I have not compared yet but couple of observations, 1. yes google ocr captures diacritics! 2. tesseract retains line breaks but google ocr provides flowing text, which is great 3. Tesseract is

Re: [tesseract-ocr] OCR of Devanagari + Diacritics + English

2019-09-15 Thread Ravi Annaswamy
That is a beautiful app. Shree Devi Kumar, what service does the 'google' selection hit? Is it free? Ravi On Sun, Sep 15, 2019 at 11:34 AM Shree Devi Kumar wrote: > Try http://ocr.sanskritdictionary.com/ > For OCR of Devanagari + Diacritics + English > > It's Google option gives better

Re: [tesseract-ocr] OCR of Devanagari + Diacritics + English

2019-09-15 Thread Ravi Annaswamy
Alex Here are the results and linked below is an example notebook for you to get started with. Code is self explanatory, and can be adapted by you. You will need to improve on many things, but here is a start. Please let me know if you have any questions.

Re: [tesseract-ocr] OCR of Devanagari + Diacritics + English

2019-09-15 Thread Shree Devi Kumar
Try http://ocr.sanskritdictionary.com/ For OCR of Devanagari + Diacritics + English It's Google option gives better result than tesseract On Sun, Sep 15, 2019, 19:43 Alexander Gribanov wrote: > Hello! > > Finally got real project for OCR. > Could anybody please give some advice in the process

Re: [tesseract-ocr] OCR of Devanagari + Diacritics + English

2019-09-15 Thread Ravi Annaswamy
I recently was able to write a notebook to read a page of single column Sanskrit and English and run through tesseract to OCR both languages I will take a look at your file and create a colab notebook sometime today or tomorrow Sent from my iPhone > On Sep 15, 2019, at 10:13 AM, Alexander

[tesseract-ocr] OCR of Devanagari + Diacritics + English

2019-09-15 Thread Alexander Gribanov
Hello! Finally got real project for OCR. Could anybody please give some advice in the process step by step, how do I make OCR for such pages? https://drive.google.com/file/d/1Wdm4_tZHWHeVFlF7ND83xtvL29OCdTY_/view?usp=sharing Do I need to split pages manually before the OCR to different type of