Alright , this does give me a starting point . I am on my R&D way :) Thank you once again
On Thursday, 16 June 2016 12:49:43 UTC+5:30, Allistair C wrote: > > Apologies, missed that! :) > > Can't see why you couldn't start with tesseract as-is for movie poster OCR > and focus instead then on image preprocessing, I.e how you send tesseract > the image to interpret. > > I would actually first have a go at trying Google Cloud Vision API as that > seems very good at picking out text from more complex scenes. Else you > should read previous posts here on detection of text areas in natural world > scenes so you can first extract text rectangles cleanly to send to > tesseract rather than one big image. I guess it depends which part of the > poster is most important (title of movie or everything like actors etc) as > titles often use very specialised fonts (not always but often) and I think > those you will find very challenging without perhaps additional training > too (see tesseract training resources) > > Good luck > > Sent from my iPhone > > On 16 Jun 2016, at 06:18, ravi katiyar <[email protected] <javascript:>> > wrote: > > Hi > > Really appreciate your prompt response , thank you for showing me some > direction. > I understand that modifying tesseract will be an uphill task , and now > specially given that the source code is been completely developed in c and > C++ it seems even more tougher. > > I did mention my use case is to be able to identify text out of movie > posters printed in newspaper. > Is someone aware of something similar to tesseract which can do this job ? > > Thanks > Ravi Katiyar > > On Thursday, 16 June 2016 03:41:36 UTC+5:30, Allistair C wrote: >> >> Hi, >> >> Your question is a little difficult to understand - it sounds like you >> are saying on the one hand you have no OCR or image processing background, >> know Java, and want to modify Tesseract toward some aim that you do not >> specify? >> >> Tesseract as far as I understand is developed using C/C++ and not Java. >> Only the Android JNI bindings would be Java. >> >> You can find the Tesseract source code at: >> >> https://github.com/tesseract-ocr/tesseract >> >> In terms of concepts you should read "An Overview of the Tesseract OCR >> Engine" written by Tesseract's lead Ray Smith as it will give you insight >> into the algorithms that are employed for its OCR. >> >> >> http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/33418.pdf >> >> Further concepts for algorithms can be found in the "Techniques" section >> at: >> >> https://en.wikipedia.org/wiki/Optical_character_recognition >> >> Sounds like an uphill struggle to me but I wish you luck! >> >> Cheers >> >> >> On 15 June 2016 at 07:28, ravi katiyar <[email protected]> wrote: >> >>> Hello All, >>> >>> I am new to the world of OCR and image processing as well. I am come >>> from a java background. >>> can someone tell what are the pre-requisite to understand the tesseract >>> code ? >>> Like java.awt.image package , Digital image processing concepts ? what >>> would I need to be thorough with so that the I am able to understand >>> tesseract code . >>> >>> I want this understanding because I am aiming to make modifications to >>> this code , so that tesseract is able to extract text from a movie poster >>> printed in a newspaper. >>> Tesseract cannot do this currently. >>> >>> Thanks >>> Ravi Katiyar >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/9a488786-ac4d-4d2e-a047-ebe329df1ea8%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/9a488786-ac4d-4d2e-a047-ebe329df1ea8%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > To post to this group, send email to [email protected] > <javascript:>. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/de18b6e5-d87a-4fc3-a4a6-79c3e952a5e0%40googlegroups.com > > <https://groups.google.com/d/msgid/tesseract-ocr/de18b6e5-d87a-4fc3-a4a6-79c3e952a5e0%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ca181040-8998-4564-86ef-cc08d8f0b587%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

