Hi Vicky,
Can you tell me more about this paper? It looks like this is not a
free document so I can't just read it to see if it would solve the
problem I have.
My problem is that I have grey-scale image data (tif/jpg/etc) that
contains text within a table format, i.e. cells on the page. The
doc
Hello,
I used this paper (for pre-processing):
Parameter-Free Geometric Document Layout Analysis, by Lee, Ryu 2001. IEEE
Tran. Patt. Analysis and Machine Int. Nov 2001 Volume 23 Issue 11 Pages 1240
- 1256
Best Regards,
Vicky
-Original Message-
From: tesseract-ocr@googlegroups.com [mail
What would you recommend to use to split the columns?
I think I will need to scan using tesseract column by column.
So after that I will need to merge it to make correct rows.
Can you point me a direction to help me?
What tools (unix compatible tools) can I use to tell tesseract to scan a
specif
There are some GUI frontends that you can use, such as VietOCR, which
are available as Java and .NET apps.
http://vietocr.sf.net
On Mar 13, 6:13 pm, Onion wrote:
> Ok, thanks. That will be too complicated for me to use. Will have to
> uninstall it.
--
You received this message because you are
Ok, thanks. That will be too complicated for me to use. Will have to
uninstall it.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email
Jose,
I run Tesseract revision 549 from the command line under Windows with
no special config and get the segmentation which is almost correct.
What language file do you use? I used the following command line
tesseract 3.tiff test3 -l eng
with no pageseg_mode (-psm argument) as well as with it,
Although Tesseract team struggles to get it more user-friendly, many
obvious user issues are still opaque or hard to find an answer to...
Tesseract is a console application, it has no GUI. You should open a
Windows command line and type a command. Read more at
http://code.google.com/p/tesseract-oc
I installed Tesseract 3.00 and the German and Czech languages as well as
English.
Now how do I run it? Are there directions somewhere?
When I click Start > Tesseract OCR, a DOS screen flashes for a split second,
then nothing happens.
Thanks
--
You received this message because you are sub
You expect way too much from Tesseract: it's not Tesseract's job to
slice and dice the text according to various organizational
requirements of applications - that's for the application to handle.
You can get all the coordinates of all characters and easily determine
which one are in what you consi
Hi Patrick,
yes the results are correct! but the format of the results it is not! that's
my trouble
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this g
Tesseract 3.00 gets this text 100% correct, including the smudged
numbers at the bottom. See:
http://www.scanbizcards.com/plate1.jpg
http://www.scanbizcards.com/plate2.jpg
(scanning was done with ScanBizCards on an iPhone - if you try it
yourself with the app on Android or iPhone, please disable i
Hi Dmitry,
sorry for the delay... I produced some samples and see if you can give them
a look!
regards,
jose
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe
Running via ports can cause diverse errors. Try to compile Tesseract
natively. I use revision 549 and as I said it works fine.
Such tables as you have present a challenge for simple layout
processing algorithms, due to sparsely located text. A minimal skew
which is almost inevitable could break al
Manuel,
The sample you provided definitely has insufficient resolution. You
may only expect some part of the heading to be recognized. So this is
what happened when I've run the recognition of your image. But I
haven't got any error or warning messages with my "por.traineddata" at
all!
However al
The first step in this technique is to threshold the image using a
manually selected threshold value. Within the text of the article this
step only deserved a line of code (pix1 = pixThresholdToBinary(pixs,
150)), but not a single word. However the fact that such a convenient
threshold luckily exis
15 matches
Mail list logo