On Sat, Oct 13, 2012 at 10:47 PM, JVIyer wrote:
> *A lot of times I have seen fairly good number plate images being OCRed
> inaccurately. This could possibly be due to the word recognition stage. Has
> anyone found a way to disable the dictionary / word recognition.
> *
> Saurabh, Have you been a
*A lot of times I have seen fairly good number plate images being OCRed
inaccurately. This could possibly be due to the word recognition stage. Has
anyone found a way to disable the dictionary / word recognition.
*
Saurabh, Have you been able to accomplish this ? Could you kindly share
your insi
by seeing the output i got. My image has 6 rows and 12 columns, but in
my output i got 12 rows and 6 columns , and all was read from right
first.(should have started from left)
On Feb 17, 6:24 pm, Andres wrote:
> Just by curiosity, how did you find that ?
>
> 2012/2/17 Aruna Devi
>
>
>
>
>
>
>
>
Just by curiosity, how did you find that ?
2012/2/17 Aruna Devi
> Even i wanted to know how to make tesseract to read my image horizontally.
> I have an image consisting of 6 rows, After training i found that my image
> is read from right side(Should be from left) and also its going down by
> co
Even i wanted to know how to make tesseract to read my image horizontally.
I have an image consisting of 6 rows, After training i found that my image
is read from right side(Should be from left) and also its going down by
column and not the row. How to solve this issue?
--
You received this me
how to set all these variables?
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For mor
In future that will be my desired approach! for the time beeing I just need
a fast and easy solution! I know it's not the most beautiful approach... but
I haven't touch a lot of the tesseract framework in order to break anything!
I was just short of time and it was easier for me to modify the sourc
Why don't you consider making your own project and statically include
in it Tesseract, or use Tesseract as a dynamic link library? In that
way you can implement any formating and other special logic you
wish...
Warm regards,
Dmitry Silaev
On Mon, Mar 14, 2011 at 2:13 PM, Jose wrote:
> I fire
I fire the execution of the tesseract in the command line and I didn't find
a way to format the results with more info.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsu
Ehmm... I don't get it. If you've succeeded in using iterators, it's
at your full disposal to format the output in any way you want
programmatically, isn't it?
Warm regards,
Dmitry Silaev
On Mon, Mar 14, 2011 at 1:56 PM, Jose wrote:
> *I only modify how the result is printed! nothing else...
*I only modify how the result is printed! nothing else... I grab all the
info from the word and it's bounding box! that is ok right?
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups
yes, I got the information from the result! I only modify has the result
method prints the result.. nothing more of course! I got the information
from the bounding box of the result! I'm not modifying it deeper than that.
--
You received this message because you are subscribed to the Google Group
I think the best approach would be to stay as far as possible from
modifying the 3rd party code. Take a closer look to ResultIterator and
PageIterator classes. Often they suffice for getting all information
you need about Tess's recognition results.
Warm regards,
Dmitry Silaev
On Mon, Mar 14,
Hi Dmitry,
thanks for the help!
and the end what I did is modify the return result function and include the
top location of the the bounding box. then I have the following result:
xy
x1y1
x2y2
x3y3
x4y4
x5y5
x6y6
x7y7
then I parse
Jose,
I run Tesseract revision 549 from the command line under Windows with
no special config and get the segmentation which is almost correct.
What language file do you use? I used the following command line
tesseract 3.tiff test3 -l eng
with no pageseg_mode (-psm argument) as well as with it,
You expect way too much from Tesseract: it's not Tesseract's job to
slice and dice the text according to various organizational
requirements of applications - that's for the application to handle.
You can get all the coordinates of all characters and easily determine
which one are in what you consi
Hi Patrick,
yes the results are correct! but the format of the results it is not! that's
my trouble
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this g
Tesseract 3.00 gets this text 100% correct, including the smudged
numbers at the bottom. See:
http://www.scanbizcards.com/plate1.jpg
http://www.scanbizcards.com/plate2.jpg
(scanning was done with ScanBizCards on an iPhone - if you try it
yourself with the app on Android or iPhone, please disable i
Hi Dmitry,
sorry for the delay... I produced some samples and see if you can give them
a look!
regards,
jose
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe
Ok I'll try to do that this afternoon.
thank you for the help
regards,
jose
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to
t
The best way to explain everything would be just to send your source
image examples, describe what information you want to get from them
and provide the community with the code snippets you use to interface
with Tess. And please be as detailed as possible.
Warm regards,
Dmitry Silaev
On Thu,
In my particular case is just a matter that the first word of each column is
in one font and the other is in another so instead of reading column by
column it reads all the columns of the first row and then all the columns of
the second row! My god is really hard to explain in english. I get an
acc
Unfortunately not only text output order can suffer from Tess's
segmentation, but also extents of some text fragments can be
identified incorrectly (say one "segmented" row can span over two
"real" rows, probably in partial way), and that in turn can lead to
*completely* irrelevant recognition resu
Dmitry the recognition works the only thing is the way it is parsing it...
:S I think segmentation of the images would be too much painful! I only
won't to change the other that is display or the bounding boxes so I could
now the x and y of the word recognized and thereby can organise the results
b
I don't know if it's affordable for you, but imho decent results can
only be achieved if you do segmentation yourself and then pass image
fragments to Tesseract on a word-by-word basis. Problems may appear
when you have words that are too short, however, as I can see, it's
not your case.
Long time
Hi, (as you now Saurabh because we talked in private the other day) I tried
the PSM_SINGLE_COLUMN and the accuracy drops dramatically... I can't afford
to loose that accuracy. Is it possible to change the way the output is
display? Looking a the code it seems rather hard to change it... perhaps I
c
Hi, do you now how to force the page layout to recognise it as horizontal? my
issue is with that! you ll make me the happiest person on earth
On 17 Feb 2011, at 04:48, Saurabh Gandhi wrote:
> Hello everyone,
>
> I am currently using tesseract 3.x for license plate recognition.
> I have an algo
Saurabh by setting on this: PSM_AUTO,PSM_SINGLE_BLOCK, PSM_CHAR are you
forcing the page to read horizontally? My problem is that I have a column of
two words separated by a white space (each word is in a diferent font) and
Instead of seeing one column of two words the OCR sees two columns of one
w
Ok I'm recompiling now... I'll let you know when it's done! thanks for the
help anyway :)
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send
you now Saurabh, that was EXACTLY was I was looking for! I couldn't be more
thankful to you! that line of code changed my life :D
thank you again :)
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract
this is JPG look like
*WORD1 * WORD2 (white space is quite "big"
*WORD1 *WORD2
*WORD1 *WORD2
*WORD1 *WORD2
*WORD1 *WORD2
*WORD1 *WORD2
*WORD1 *WORD2
and it reads like:
*WORD1 *
*WORD1 *
*WORD1 *
*WORD1 *
*WORD1 *
*WORD1*
WORD2
WORD2
WORD2
WORD2
WORD2
WORD2
WORD2
any help would be r
ok I'll try that! I have to modify this on the tesseractmain.cpp right? (I'm
using command line execution)
I replace this line : api.SetPageSegMode(tesseract::PSM_AUTO);
for api.SetPageSegMode(tesseract::PSM_SINGLE_COLUMN); and then recompile
right?
thanks for the help
--
You received this mess
Is there no other workarround? If I reduce the white space size of the WORD1
WORD2 then it all works fine! This space is making the OCR think it's
another column! Is there no another way? Splitting the image as many rows
looks something not really eficient
--
You received this message because you
great...
--
Regards,
Saurabh Gandhi
On Fri, Feb 18, 2011 at 5:16 PM, Jose wrote:
> you now Saurabh, that was EXACTLY was I was looking for! I couldn't be more
> thankful to you! that line of code changed my life :D
>
> thank you again :)
>
--
You received this message because you are subsc
Yes, thats right.
--
Regards,
Saurabh Gandhi
On Fri, Feb 18, 2011 at 4:57 PM, Jose wrote:
> ok I'll try that! I have to modify this on the tesseractmain.cpp right?
> (I'm using command line execution)
>
> I replace this line : api.SetPageSegMode(tesseract::PSM_AUTO);
> for api.SetPageSegMode
Did you try PSM_SINGLE_COLUMN. I think that is what you need. Could you try
this and let us know how it behaves please.
PSM_SINGLE_COLUMN, ///< Assume a single column of text of variable sizes.
--
Regards,
Saurabh Gandhi
On Fri, Feb 18, 2011 at 4:29 PM, Jose wrote:
> Is there no other work
Hello Jose,
Setting the mode to PSM_SINGLE_BLOCK or PSM_SINGLE_LINE will not force
horizontal reading. These modes will just assume that your input image
itself is segmented and consists of just a single line. So, if you want
horizontal reading you will have to segment your image and provide it to
You can simply use this in your program just after init to set whitelist /
blacklist:
*api.Init(argv[**0**],** **lang,** **&(argv[arg]),** **argc-arg,** **false**
);**
**api.SetVariable(**"tessedit_char_whitelist"**,** **
"ABCDEFGHIJKLMNOPQRSTUVWXYZ.0123456789 "**);*
--
Regards,
Saurabh Gandhi
*Customise the tesseract engine to recognize only the characters from
**A-Z,0-9,.(dot),
(space) by setting the character white-list * Kindly furnish the name of
the folder in which whitelist as well as blacklist are existed. I want to
utilise the same for Kannada scripts.
-sriranga(78yrs)
On Fr
>From all this, I have identified the following ways of improving the
results:
1. Customise the tesseract engine to recognize only the characters from
A-Z,0-9,.(dot), (space) by setting the character white-list. My
understanding is that the white-list is the list of characters that are
Hello everyone,
I am currently using tesseract 3.x for license plate recognition.
I have an algorithm which does a good job in pre-processing the input image
to localize the plate.
However, when I use the Tesseract OCR engine to classify the plate number,
the recognition is not that accurate. I
41 matches
Mail list logo