Hi Really appreciate your prompt response , thank you for showing me some direction. I understand that modifying tesseract will be an uphill task , and now specially given that the source code is been completely developed in c and C++ it seems even more tougher.
I did mention my use case is to be able to identify text out of movie posters printed in newspaper. Is someone aware of something similar to tesseract which can do this job ? Thanks Ravi Katiyar On Thursday, 16 June 2016 03:41:36 UTC+5:30, Allistair C wrote: > > Hi, > > Your question is a little difficult to understand - it sounds like you are > saying on the one hand you have no OCR or image processing background, know > Java, and want to modify Tesseract toward some aim that you do not specify? > > Tesseract as far as I understand is developed using C/C++ and not Java. > Only the Android JNI bindings would be Java. > > You can find the Tesseract source code at: > > https://github.com/tesseract-ocr/tesseract > > In terms of concepts you should read "An Overview of the Tesseract OCR > Engine" written by Tesseract's lead Ray Smith as it will give you insight > into the algorithms that are employed for its OCR. > > > http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/33418.pdf > > Further concepts for algorithms can be found in the "Techniques" section > at: > > https://en.wikipedia.org/wiki/Optical_character_recognition > > Sounds like an uphill struggle to me but I wish you luck! > > Cheers > > > On 15 June 2016 at 07:28, ravi katiyar <[email protected] <javascript:>> > wrote: > >> Hello All, >> >> I am new to the world of OCR and image processing as well. I am come from >> a java background. >> can someone tell what are the pre-requisite to understand the tesseract >> code ? >> Like java.awt.image package , Digital image processing concepts ? what >> would I need to be thorough with so that the I am able to understand >> tesseract code . >> >> I want this understanding because I am aiming to make modifications to >> this code , so that tesseract is able to extract text from a movie poster >> printed in a newspaper. >> Tesseract cannot do this currently. >> >> Thanks >> Ravi Katiyar >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/9a488786-ac4d-4d2e-a047-ebe329df1ea8%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/9a488786-ac4d-4d2e-a047-ebe329df1ea8%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/de18b6e5-d87a-4fc3-a4a6-79c3e952a5e0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

