one... pixel... difference

2012-10-25 Thread Phlip
Tesseractors: We are using Tesseract for an outside-of-the-box situation - not scanning neatly typed documents. Our situation is a fuzzy, low-contrast picture. But - even when I use many image enhancements, such as leveling the colors, blurring them, improving the contrast, shrinking the image,

Re: Page versus region OCR

2012-10-25 Thread Gaara Sabaku
again, processing the box data will help you identify such an occurrence, you can then filter and reprocess the hard words separately. On Mon, Oct 22, 2012 at 10:12 AM, GeorgeS sxoutt...@gmail.com wrote: With the current Tesseract engine I've noticed that if I perform a full-page OCR and

Re: one... pixel... difference

2012-10-25 Thread Gaara Sabaku
tesseact can improve its accuracy if it can scan the document multiple times. The only way to achieve this benefit is to include the repetition in a single tiff with multiple pages. You can identify single instances of hard words when parsing the confedence of the output. On Wed, Oct 24, 2012 at

Re: Help on the SIGSEGV error in tesseract baseApi.getUTF8Text() function

2012-10-25 Thread Gaara Sabaku
There are safe and unsafe ways of calling funtions like that in tesseract. by what means to you call the functions at the call stack locations when the failure occurs? On Tue, Oct 23, 2012 at 7:12 PM, Thilina Yapa Bandara tyband...@gmail.comwrote: Hi fabriciano, My image set is always the

Re: Ligature detection

2012-10-25 Thread Gaara Sabaku
You can analyze the box data and easily detect that these letters overlap there boxes. On Wed, Oct 24, 2012 at 4:59 PM, Ryan rb...@pdftron.com wrote: I am using GetUTF8() to get ocr results from attached image. As you can see from the image there is a ligature (0x00E6 æ) and tesseract-ocr

Re: one... pixel... difference

2012-10-25 Thread Tom Morris
On Wednesday, October 24, 2012 11:37:18 PM UTC-4, Phlip wrote: Tesseractors: We are using Tesseract for an outside-of-the-box situation - not scanning neatly typed documents. Our situation is a fuzzy, low-contrast picture. But - even when I use many image enhancements, such as leveling

Error when executed

2012-10-25 Thread Manohar
Step1: created a JPEG file using mspaint, just written Hello World in the file and save as *.JPEG Step2: Opened the command prompt Step3: tesseract.exe Test.JPG Out.txt Gave the following error : Cannot create output file Out.txt. Kindly let me know how to resolve this issue. -- You

Re: Ligature detection

2012-10-25 Thread Ryan
Thanks again! GetBoxText() was the ticket. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to

Re: Training tesseract 3.01 with new font, for reading non dictionary strings - ideal training text?

2012-10-25 Thread Gaara Sabaku
For your purposes a simple approach will yield the best results. The reason it is recommended to repeat letters is because tesseract does not train or read well with small samples due to its approximation/heuristic methods. As tesseract processes the image it improves apon itself and then takes a

Re: one... pixel... difference

2012-10-25 Thread Phlip
On Thu, Oct 25, 2012 at 8:21 AM, Gaara Sabaku kage.sabaku.no.ga...@gmail.com wrote: tesseact can improve its accuracy if it can scan the document multiple times. The only way to achieve this benefit is to include the repetition in a single tiff with multiple pages. You can identify single

Numbers recognition

2012-10-25 Thread Germán Cuéllar
I have tried to OCR a table with very poor results especially for columns containing time data. No language file helped. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com To