Tesseractors:
We are using Tesseract for an outside-of-the-box situation - not
scanning neatly typed documents.
Our situation is a fuzzy, low-contrast picture. But - even when I use
many image enhancements, such as leveling the colors, blurring them,
improving the contrast, shrinking the image,
again, processing the box data will help you identify such an occurrence,
you can then filter and reprocess the hard words separately.
On Mon, Oct 22, 2012 at 10:12 AM, GeorgeS sxoutt...@gmail.com wrote:
With the current Tesseract engine I've noticed that if I perform a
full-page OCR and
tesseact can improve its accuracy if it can scan the document multiple
times. The only way to achieve this benefit is to include the repetition in
a single tiff with multiple pages. You can identify single instances of
hard words when parsing the confedence of the output.
On Wed, Oct 24, 2012 at
There are safe and unsafe ways of calling funtions like that in tesseract.
by what means to you call the functions at the call stack locations when
the failure occurs?
On Tue, Oct 23, 2012 at 7:12 PM, Thilina Yapa Bandara
tyband...@gmail.comwrote:
Hi fabriciano,
My image set is always the
You can analyze the box data and easily detect that these letters overlap
there boxes.
On Wed, Oct 24, 2012 at 4:59 PM, Ryan rb...@pdftron.com wrote:
I am using GetUTF8() to get ocr results from attached image. As you can
see from the image there is a ligature (0x00E6 æ) and tesseract-ocr
On Wednesday, October 24, 2012 11:37:18 PM UTC-4, Phlip wrote:
Tesseractors:
We are using Tesseract for an outside-of-the-box situation - not
scanning neatly typed documents.
Our situation is a fuzzy, low-contrast picture. But - even when I use
many image enhancements, such as leveling
Step1: created a JPEG file using mspaint, just written Hello World in the
file and save as *.JPEG
Step2: Opened the command prompt
Step3: tesseract.exe Test.JPG Out.txt
Gave the following error : Cannot create output file Out.txt. Kindly let
me know how to resolve this issue.
--
You
Thanks again! GetBoxText() was the ticket.
--
You received this message because you are subscribed to the Google
Groups tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
For your purposes a simple approach will yield the best results. The reason
it is recommended to repeat letters is because tesseract does not train or
read well with small samples due to its approximation/heuristic methods. As
tesseract processes the image it improves apon itself and then takes a
On Thu, Oct 25, 2012 at 8:21 AM, Gaara Sabaku
kage.sabaku.no.ga...@gmail.com wrote:
tesseact can improve its accuracy if it can scan the document multiple
times. The only way to achieve this benefit is to include the repetition in
a single tiff with multiple pages. You can identify single
I have tried to OCR a table with very poor results especially for columns
containing time data. No language file helped.
--
You received this message because you are subscribed to the Google
Groups tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To
11 matches
Mail list logo