Most of these errors look harmless.you could try adding the compiler define
EMBEDDED, or alternatively, delete all the code that refers to sigmenu, as
it is not used any more.
That gets rid of most of them. I am not sure off-hand whether the remaining
errors can be removed so easily, but it is
It is possible, and there are broken bits of code that support that kind of
training, but it hasn't been used for years and no longer works, so it would
take quite a lot of effort to get it working.Ray.
On Thu, Nov 20, 2008 at 4:29 AM, Philipp Lenssen
[EMAIL PROTECTED]wrote:
Hi! I read through
You can upload files to groups. It wuld help to diagnose your problem.
BTW are you using English, or your own training data?
Ray.
Sent from my G1 Android Phone.
On Dec 2, 2008 12:58 AM, udippel [EMAIL PROTECTED] wrote:
For me, Tesseract does a good job. The recognition rate is
comparatively
This problem has not been attempted before with tesseract.
The biggest thing to watch out for is to skip the text line and word
finding. You might have significant success just running the classifier on
the connected components.
Training might be a bit tricky too, since it relies on the text line
to the previous line or an extra line in
between. I've also observed that sometimes, the same symbol can be
recognized easily when it occurs in a subscript position, but is often
mistaken when it occurs in a superscript position.
lab.
On Dec 12, 8:51 am, Ray Smith theraysm...@gmail.com wrote
See http://code.google.com/p/tesseract-ocr/issues/detail?id=160Ray.
On Fri, Dec 19, 2008 at 5:37 PM, lab la...@lbreyer.com wrote:
In my experience, TIFF files sometimes have an alpha layer. The
easiest way to ensure a usable image for tesseract is to do these two
steps (on Debian)
convert
http://code.google.com/p/tesseract-ocr/issues/detail?id=160
On Mon, Dec 22, 2008 at 11:35 PM, ABB abbhoos...@gmail.com wrote:
Link not found :-(
On Dec 21, 12:02 am, Ray Smith theraysm...@gmail.com wrote:
Seehttp://code.google.com/p/tesseract-ocr/issues/detail?id=160Ray.
On Fri
Hi Brucey,
It would be helpful if you could post a patch, either on the issues list or
here. I haven't yet completed my scan of the issues list, but I am trying to
incorporate all the portability patches into 2.04 before we go up to 3.0
next year. I have just uploaded some of them to svn. I know
As with any C to C++ interface, you just have to write a C layer on top of
the C++ interface. It isn't very difficult, just ugly.The existing dll api
already provides such a C layer.
Ray.
On Tue, Dec 30, 2008 at 4:11 AM, HK herve75...@yahoo.fr wrote:
In tesseract package there is an example of
Tesseract uses very little memory. The most optimal way of using multiple
cores is simply to have multiple processes running simultaneously,
processing different pages. If you want to get more sophisticated than that,
you will have to wait for the completion of the thread-safety project, as
The output is indeed utf-8.Ray.
On Thu, Jan 8, 2009 at 11:00 AM, Michael Moore stuporg...@gmail.com wrote:
On Thu, Jan 8, 2009 at 11:30 AM, Darren Govoni dar...@ontrenet.com
wrote:
Hey Michael,
I really appreciate the tips. I'm developing an automated batch
ocr'ing system and there
There will be explicit support for single line mode in 3.00, mostly for the
benefit of ocropus.Ray.
On Sat, Jan 10, 2009 at 11:41 PM, federico.boschetti
federico.boschetti...@gmail.com wrote:
I'm using tesseract in conjunction with ocropus to recognize ancient
Greek.
Ocropus makes the
You need additional dlls and there is too much disagreement in the windows
user community over which version of the developer platforms to use. Making
windows executables is fraught with problems, which is why I am looking for
someone to write a windows installer...Ray.
On Mon, Jan 12, 2009 at
Also you might need to scale up. See the FAQ.Ray.
On Fri, Jan 16, 2009 at 12:41 PM, spackmann
richardspackm...@greenfieldfd.org wrote:
Try using ImageMagick and cropping the image down to just the white
bottom border which contains the text you want to OCR.
Unfortunately the TessBaseAPI isn't included in the tessdll, as the dll is a
separate api.
It can be fixed by adding the appropriately defined DLLSYM to the class
definition. You also have to include the appropriate definition of DLLSYM so
that it is defined to be DLLEXPORT when building the DLL,
Vedi questa pagina per la compilazione su Windows:
http://code.google.com/p/tesseract-ocr/wiki/ReadMeRay.
On Fri, Jan 23, 2009 at 11:49 AM, Design Software
marasco.marco.designsoftw...@gmail.com wrote:
Salve a tutti gli utenti del Gruppo!
volevo chiedervi una cosa. Ho intrapreso l'utilizzo
Yup, it was the batch.nochop that was making th edifference.Ray.
On Fri, Jan 23, 2009 at 12:21 AM, Alatius johan.wi...@gmail.com wrote:
Oh! I just realised that if I create the box file with tesseract
boxtxtdiff.tif boxtxtdiff batch makebox (i.e. without .nochop'),
the same characters are
: 'ocrclass.h': No such file or directory
1main.cpp
The tessdll.dll indeed contains the TessDllAPI, but why the .h files
above cann't be found ?
On 1月22日, 上午12时54分, Ray Smith theraysm...@gmail.com wrote:
Unfortunately the TessBaseAPI isn't included in the tessdll, as the dll
is a
separate api
Windows isn't completely stupid. The file system accepts / so there is no
need to change.Ray.
On Wed, Jan 21, 2009 at 1:40 PM, Israel calhei...@gmail.com wrote:
ok, i have seen the code.
i need to configure the path under the system variable TESSDATA_PREFIX
but the back slash is not
If you get the new platform.h for this location:
http://code.google.com/p/tesseract-ocr/source/browse/trunk/ccutil/platform.h
the problem of _vsnprintf should be solved.
Ray.
On Thu, Jan 15, 2009 at 8:34 AM, SteveP spohor...@sjm.com wrote:
Did you see my post from Jan 13, 2009? This might be
Get a new platform.h from here:
http://code.google.com/p/tesseract-ocr/source/browse/trunk/ccutil/platform.hto
fix the vsnprintf problem.
Ray
On Tue, Jan 13, 2009 at 5:28 PM, SteveP spohor...@sjm.com wrote:
There is a solution for the compile issues in VS.NET 2008, at least if
you got the
A lot depends on your application and the type of image that you want to
OCR.Tesseract still lacks page layout analysis.
Its character error rate is probably about 2x worse than the best commercial
engines, but that will vary according to the image quality.
If your image quality and fonts are very
You can use \u notation eg:\u20ac\u00a3 gives you the Euro sign and
the pound sign. The compiler converts the unicodes to utf8 strings.
Not sure if old compilers like vc6 support it. You might need to use \xhh to
specify utf8 byte codes.
Ray.
On Fri, Feb 6, 2009 at 7:55 AM,
2.04 is likely to appear just before 3.00. Its purpose is to incorporate
patches that have been provided to the group.and to fix as many
bugs/compilation issues as possible so there is a stable base version prior
to the reelease of 3.00, which is likely to introduce a new pile of such
issues.
Ray.
You might also like to check out the FAQ
http://code.google.com/p/tesseract-ocr/wiki/FAQ on color images.Ray.
On Fri, Mar 6, 2009 at 6:45 AM, Albert Law a...@snowbound.com wrote:
ps: You haven't stated why big TIFF files cause problems. Is it a HD thing
or a main memory thing?
-
Albert
See where is the documentation in the
FAQhttp://code.google.com/p/tesseract-ocr/wiki/FAQ
Ray.
On Thu, Mar 5, 2009 at 3:33 AM, mynickmynick mynickmyn...@yahoo.com wrote:
The tesseract source code is so wide that it's pretty a long journey
having to read it all. Could you suggest some tutorial
FAQ Minimum text size.Ray.
On Mon, Mar 9, 2009 at 3:20 AM, Thomasyi thomasyi2...@yahoo.com wrote:
Forgot to write down my system information:
Windows XP SP1
Tesseract-OCR 2.03 with windows executables
--~--~-~--~~~---~--~~
You received this message
Unfortunately this just trains incorrect outlines.The problem is that
applybox doesn't do forced chopping of touching outlines, but it needs to.
You need to render your training text with a small amount of inter-character
spacing so that the samples don't touch in the first place.
Ray.
On Thu,
I added it to the training wiki. Looks like there is a long list of comments
there too...Ray.
On Thu, Mar 12, 2009 at 2:40 AM, Ondra stradasi...@gmail.com wrote:
Hi all,
here http://www.ospilka.com/dl/tessboxer.zip is recoded tessboxer for
windows which works with large files without
Probably. It builds on Android.Ray.
On Tue, Mar 10, 2009 at 2:40 AM, mynickmynick mynickmyn...@yahoo.comwrote:
Thank you for help
Do you guess it's feasible to port it to a linux embedded platform
using buildroot and uclibc uclibc++ instead of glibc?
Yes there is a circular dependency.You get round it by using all the 8 stock
files for english while you make your new set.
Ray.
On Tue, Mar 24, 2009 at 6:09 AM, Ray Renteria r...@robotcentral.com wrote:
BTW, I'm on Windows XP and I'm running the command-line version.
--Ray
Such small amounts of text may confuse the textline/baseline/word finder.You
don't need a large number of samples, but it does add some randomized noise,
so multiple samples are desirable.
Ray.
On Thu, Mar 26, 2009 at 9:05 AM, SteveP spohor...@sjm.com wrote:
What I have noticed about tesseract
I have begun a wiki
http://code.google.com/p/tesseract-ocr/wiki/ViewerDebugging?ts=1238525607updated=ViewerDebugging
page
on this subject. This question has not come up much so far. The page will be
completed in due course.Ray.
On Tue, Mar 24, 2009 at 5:22 PM, Fuad Jamour fjam...@gmail.com wrote:
Tesseract relies on having a significant number of characters to get decent
statistics on the baseline position, x-height etc., so a few scattered
characters will put it in danger of making a lot of stupid errors. It even
struggles with a single line of text. BTW this is the reason the training
?
On Mar 31, 9:21 pm, Ray Smith theraysm...@gmail.com wrote:
Tesseract relies on having a significant number of characters to get
decent
statistics on the baseline position, x-height etc., so a few scattered
characters will put it in danger of making a lot of stupid errors. It
even
i have doubt on how the tessaract OCR is working? that is what are all
steps to perform to extract the text from an image? Please explain about
this?
thanking you.
On 4/2/09, Ray Smith theraysm...@gmail.com wrote:
If you have the advantage of working
Interesting result. The problem is that the value of DangAmbigs varies
according to the size of the document being OCRed.
Very small documents don't benefit from the adaptive classifier at all, so
DangAmbigs has very little effect.
Very large (eg multipage) documents benefit greatly from the
The command-line tesseract can read unlv zone files. Take a look at the code
in ccstruct/blread.cpp or visit www.isri.unlv.edu for
documentation.Programatically,
TesseractRect allows you to recognize an arbitrary rectangle.
Ray.
On Thu, Apr 9, 2009 at 5:40 AM, 74yrs old withblessi...@gmail.com
This is a known problem. It has never really been tuned for small amounts of
text, but does need some work.
Ray.
On Apr 16, 2009 6:53 AM, George zor...@163.com wrote:
We found the reason. As you said: you don't give enough samples to
Tess. Thank you.
When some words are not complete, Tesseract
I don't think there is much that cannot be run on wince if it can be
modified to run on android...
It should be relatively easy. The debug code is about the only code that
uses many os functions other than basic file io.
Ray.
On Apr 16, 2009 7:13 AM, George zor...@163.com wrote:
How about it
You can get to my icdar paper by searching groups for ray's paper.
There is another description of the debug/control variables ifyou follow the
documentation link on the tesseract home page.
There is a fundamental problem with the tesseract features for very small
text. To allow it to recognize
Screen text has some unique problems, in additon to the small text problem.
The anti-aliasing used to get more virtual pixels out of a screen make it
really difficult to get useful images out of the screen. In addition to
this, tesseract's use of a polygonal approximation makkes it difficult for
This is a problem with the applybox code that needs to be fixed.
In 2.03/4 deleting the extra character from the unichaarset is not a problem
beyond the fact that it won't recognize them. In 3.00 it gets more serious,
as it will be using the order in the unicharset to determine what to output
for
You also need -Iccmain in your compiler options and #include baseapi.h in
your code.
Ray.
On Apr 26, 2009 10:06 PM, sai firetr...@gmail.com wrote:
I've installed the tesseract 2.01/2.03 and even the read-only edition
on mac, successfully, I guess.
However, after setting the linker flags,
Sorry, it is still in incubation. THe latest news is that it will not build
on VC++6. It is about time it went away. It will hopefully build on
VC++express 8 (downloading it and installing now.)Ray.
On Thu, Apr 30, 2009 at 5:46 PM, Rob H. hksny...@gmail.com wrote:
Is Tesseract 3.0 available
The best documentation for these is still here:
http://tesseract-ocr.repairfaq.org/tess_variables_all.htmlRay.
On Tue, Apr 28, 2009 at 11:53 PM, g.getsov georgi.get...@googlemail.comwrote:
Hello
Does anyone have a list of options that could improve (or at least
change) the performance of the
Looks like the input image was of poor quality or otherwise damaged.
Ray.
On May 19, 2009 1:02 PM, collimarco collimarc...@gmail.com wrote:
I have successfully installed Tesseract through MacPorts along with
Italian language package. Tesseract seems to work properly, but when I
open the output
This kind of variability is a bit of a problem, and it seems to occur when
the image is of insufficient quality, or the font is far from the training
data.At some point, we may find a solution, but for now, the best solution
is to retrain on the data you want to recognize.
Ray.
On Thu, May 28,
With single characters, it loses the ability to find the baseline, xheight
etc, so certain sets of characters will all look alike.Having said that,
3.00 will have a single character mode that enables you to at least attempt
to recognize them.
Ray.
On Thu, May 28, 2009 at 2:55 AM, paulfeakins
I think you need the more recent code from svn.Ray.
On Wed, May 27, 2009 at 8:23 AM, Adrian adrian03...@gmail.com wrote:
Hi, I was trying to install tesseract on my Debian Lenny (Intel 32
bits) and I got the ./configure ok (it tells I can run make), the
lines with no, missing and the config
RTFM. See the FAQ on small text.Ray.
On Tue, May 19, 2009 at 1:33 PM, denis56 denis.ergashb...@gmail.com wrote:
Here is the link to three files that I mentioned (original, converted
with java imageio package, and with Image Converted utility)
http://www.speedyshare.com/732780799.html
The problem is probably that the textline finder is splitting your
characters over multiple lines. While it is not supposed to do this, it does
it sometimes. A fix to applybox is needed so it can still work in this
situation.Ray.
On Thu, May 14, 2009 at 11:26 PM, Raj mail2sun@gmail.com wrote:
have to dig into the
tesseract source code and remove the un-supported code. :(
Any hints would be greatly anticipated, thanks.
Best Regards
Liutao
On 5月30日, 上午1时31分, Ray Smith theraysm...@gmail.com wrote:
Yes, it needs a bit of work to properly compile it, but the EMBEDDED
Should be in 3.00.Ray.
2009/4/5 Arno Teigseth arnot...@gmail.com
Hei Arnstein,
Jeg har litt erfaring med å lære opp, og har noen script du kan bruke som
gjør jobben en hel del enklere. Hvis du vil, Kan jeg sende deg dem (har ikke
funnet ut helt hvordan jeg legger dem til på tesseract-sidene
Yes. This should be resolved in 2.04.Dawg generation will be further
improved in 3.00, with abolition of the fixed memory buffer, but the data
files and code will not be backwards compatible.
The 3.00 dawgs will be tied to a specific unicharset file.
Ray.
On Fri, Apr 17, 2009 at 12:31 PM, Debayan
I have added a clarification to the training wiki that might explain this
better.Ray.
On Mon, Apr 20, 2009 at 12:46 PM, MilanKnizek knizek.co...@volny.cz wrote:
I have come recently to Tesseract, since it is used by OGMRIP for OCR
of DVD subtitles. First run for the subtitles in the Czech
You could preprocess the images with a morphological operation, such as
dilation to make the fragments touch again, but to solve it in general is a
hard problem.Ray.
2009/4/24 tt yury.tarasiev...@gmail.com
When making .boxes, re-using my own training results, and with rather
brightishly lit
It would be a start to train it with the data it has to deal with. You could
make use of the fact that it happiuly deals with multi-char strings to train
it on all 20 possible answers in several different (at least 8)
orientations.
Ray.
On Jun 5, 2009 4:04 PM, Brian bfor...@gmail.com wrote:
I have made this clearer and bigger on the home page (which everybody
merrily ignores anyway) and in the ReadMe wiki.Also updated the FAQ to point
to the wiki page. A lot of users have had trouble understanding this,
Hopefully it will be clearer now. It will be very important for 3.00, as
there
Yes! Read the FAQ and the ReadMe wikis to find out how to add support for
compressed tif.For other formats, you need 3.00, which is not ready yet.
Ray.
On Wed, Jun 10, 2009 at 11:16 PM, naresh naresh.kanduk...@gmail.com wrote:
One more question i have regarding tesseract ocr engine.
Did
mnjrupp,
OK, so I hadn't tested with libtiff, but I just did and it works, but that
was building with vc++ express 2008, and using the 2.04 tesseract.sln.
I followed my own instructions on the readme wiki, and it worked without
problem.
You can't use VC++ 2005 because MS changed the file format.
Nice to hear that works. The 2.04 sln builds this way by default. When 2.04
is fully released I will add a tarball containing an exe that is built this
way, so a lot of users will be able to just download it and run...Ray.
On Tue, Jun 9, 2009 at 8:12 PM, Hasnat mhas...@gmail.com wrote:
Dear
Try the current svn code, and add the preprocessor definition
GRAPHICS_DISABLED.Another that you can try is EMBEDDED, but you might not
need it for windows mobile.
Ray.
On Mon, Jun 8, 2009 at 2:13 AM, Raj mail2sun@gmail.com wrote:
Hi,
I'm also trying 2 integrate Tesseract OCR on windows
Try the 2.04 pre-release on the svn site:
http://code.google.com/p/tesseract-ocr/source/checkout
http://code.google.com/p/tesseract-ocr/source/checkoutYou need VC++express
2008.
Ray.
On Sat, Jun 6, 2009 at 10:27 AM, alva alvashe...@gmail.com wrote:
Oh and one more thing, the tesseract.txt
The problem is that some people complain that there is too little
documentation, while others don't read what little there is.I have removed
the windows libtiff section from the README wiki, as there was no help for
linux there, and put a pointer to the FAQ
Issues with tiff file reading were fixed for 2.04, now in svn for early
testers.With screenshots read the wiki FAQ.
Ray.
On Tue, Jun 16, 2009 at 12:35 PM, Salahuddin Pasha
salahuddi...@gmail.comwrote:
I had same problem in MacOS 10.5.x
First, install the libtiff from
That is a hard problem. I don't think Tesseract will be of much use for
handwriting, especially Doctors' handwriting.Ray.
On Mon, Jun 15, 2009 at 7:11 PM, umanga umanga@gmail.com wrote:
Greetings all,
In the project I am working with I have a scanned PDF document.This
document has
Running the same data through the training system multiple times does not
change accuracy in tesseract. It does not use a back-propagation training
process at this time.Ray.
On Fri, Jun 12, 2009 at 5:39 AM, Yury Tarasievich
yury.tarasiev...@gmail.com wrote:
Is the quality of recognition
Looks like 2 different versions to me, but even if they weren't you do get
different results with different compilers/architectures, partly due to the
use of random numbers in one of the algorithms, and possibly due to
different floating point treatment and/or different qsort functions.Ray.
On
Have you tried runautoconf first? It seems that if the installed version of
the automake tools is vastly different to the one I used to make configure,
then it doesn't work.Ray.
On Wed, Jun 17, 2009 at 2:25 PM, timmckenna mckenna@gmail.com wrote:
Granted I probably have no business in the
In 3.00 you can use leptonica to read the image and then pass the Pix
directly to tesseract.Ray.
On Mon, Jun 29, 2009 at 8:44 AM, Yury Tarasievich
yury.tarasiev...@gmail.com wrote:
A P wrote:
Yury,
Did you use the -depth 8 flag or some other option?
Well, I used what seemed to be the
You can use TessBaseAPI::TesseractExtractResult, but you will have to hack
the code a bit to do it, as it is a protected member. If we can correct the
way ocropus uses tesseract, we can make this a useful single public member
that anyone can use.Ray.
On Sun, Jun 28, 2009 at 2:45 PM, hvthaibk
The problem is that tessdll uses its own api instead of the baseapi.You have
2 possibilities:
1. Rewrite your code to use the dll directly (see tessdll.h)
2. Mark TessBaseAPI as dll export by putting a TESSDLL_API in the class
definition and putting the appropriate magic incantations to make
character regardless of their neighborings?
Moreover, the confidence values usually above 100. Is there anything
wrong here as tesseract produces confidence values in the range 0-100
only?
Thai
On Jun 30, 11:27 am, Yury Tarasievich yury.tarasiev...@gmail.com
wrote:
Ray Smith wrote:
You
some hand image processing with gimp or something else.
Ray.
2009/7/2 robi robertmilow...@gmail.com
HI,
I have the same problem for both 2.04 versions (Linux and Windows)
Ray Smith napisał(a):
The windows 2.04 executable will be available soon, after I get through
the
comments
See the training wiki.Ray.
On Tue, Jun 30, 2009 at 3:42 PM, taelmx tae...@gmail.com wrote:
Hey guys, could this possibly be used to identify icons on a rather large
resolution CAD drawing?(Rasterized)
It's a symbol that looks like a [T] with diagonal lines in the squareBy
chance would I
Sorry about that misleading comment. I have improved the FAQ. The fix in
2.04 is that it works properly with libtiff, NOT that it reads more tiff
files without it.
Leptonica itself likes to have (doesn't absolutely need) additional imaging
libraries (tiff, jpg, png, gif) and then can read all
The 32 font limit (MAX_NUM_CONFIGS) was a hardware limit. (Long story) The
code that reads the inttemp file in 2.04 and below is specific to the value
of MAX_NUM_CONFIGS so you can increase it as long as you retrain yourself.
With 3.00, the data file reader is able to read files with a different
*This is a plea for help!*
Anyone interested in seeing 3.00 this side of August?
Here is the status:
Linux:
Preliminary alpha release compiles and runs. It is slower than 2.04, due to
the new page layout analysis, but the benefits are supposed to outweigh
that:
Page layout analysis.
*Lots* of
Done. All the wikis will need a major update for 3.00 when it comes anyway.
Ray.
On Mon, Jun 1, 2009 at 3:51 PM, Matt Chan talc...@gmail.com wrote:
I think I got around it. I wasn't copying over the word-dawg and freq-
dawg files from another language or generating them. I just touched
empty
-09 at 19:10 -0700, Ray Smith wrote:
This is a plea for help!
Anyone interested in seeing 3.00 this side of August?
Here is the status:
Linux:
Preliminary alpha release compiles and runs. It is slower than 2.04,
due to the new page layout analysis, but the benefits
There was no documentation for 2.04 because the api was to change for 3.00.
That change has now happened. There still isn't much documetation, but
api/baseapi.h is fairly well commented, and intended to be largely
selkf-documentuing. Most people prefer examples to api documentation, and
they are
For windows. Some of the code was part of a windowss appthat was broken into
dlls.
Ray
On Jul 21, 2009 7:41 AM, Sandeep sandeep.a...@gmail.com wrote:
Why is DLLSYM defined in the platform.h and then used in front of
class and function declarations ?
If there are blank lines between paragraphs, the new page layout will do
this for you in 3.00. If not, it willprobably do this in the future.
If you want to have a crack at it yourself, you would have to modify the
page layout analysis or add it as a postprocess based on the word boxes.
Ray.
On
The box file needs an extra field on each line giving the page number. Can't
remember whether 0 based or 1. I think 0 for the first page.
Mftraining and cntraining need no modification. The tr file is just a stream
of feature sets, so they don't care.
Ray.
On Jul 19, 2009 7:45 AM, 74yrs old
Hi and thanks for volunteering. Although tesseract is a command-line
program, there are still a lot of users that trip over at the first hurdle
of having to unpack the tar.gz of the binary and add the language files in
the right place. With 3.00, things are simpler in that I have the vcroj file
They were unchanged. 2.04 was mostly a bug and portability release.3.00 on
the other hand is completely different.
Ray.
On Sun, Aug 9, 2009 at 4:57 PM, Daryl c...@daryllafferty.com wrote:
I am using tessdll from version 2.03 in a C++ Windows program.
I see there is a 2.04 version of
Sounds like loss of the last word of the first line, or a soft-hyphen
problem to me.Ray.
On Sun, Aug 9, 2009 at 4:49 PM, Daryl c...@daryllafferty.com wrote:
I am using tessdll in a C++ program.
Sometimes, seemingly randomly, Tesseract will join sequential lines
together without even a space
Part of word rejected due to word being too long.That is one of the reasons
why the training wiki says to make your training data look like real words.
Ray.
On Mon, Jul 27, 2009 at 10:51 PM, Hans Peter Bremer
hapebre...@googlemail.com wrote:
Hi,
i've got a problem with the creating of a
Using 3.00, use api.SetPageSegMode(PSM_SINGLE_CHAR); after api.Init()See
api/tesseractmain.cpp.
Ray.
On Mon, Jul 13, 2009 at 7:11 AM, hvthaibk hvtha...@gmail.com wrote:
Hello,
I am trying to use tesseract to recognize an image containing one
character only. How could I turn off the
.. is that right? Does the box files supports that?
I think someone had posted this question also...
thanks ;)
On Wed, Aug 5, 2009 at 3:43 PM, Ray Smith theraysm...@gmail.com wrote:
There was no documentation for 2.04 because the api was to change for
3.00. That change has now happened
The problem is that the configure script was out of date. I have just
updated the configure script and it should now work, unless your system
doesn't have the correct version of autotools, in which case you still have
to run runautoconf.
tesseractmain.cpp moved to the new api directory.
Please let
See answer to issue 233.
Ray.
On Thu, Aug 13, 2009 at 1:32 AM, cmm mod...@fbk.eu wrote:
Hi!
I'm using tesseract (version 2.04 / Linux) to recognize text extracted
from images.
My problem is that apparently I have different results by appling
twice a tesseract function on the same bitmap. I
Thread-safety is not yet available, but will probably be available in a
future release.
The 3.00 api moves towards this by making it based on an instance of the api
instead of static methods. It is nearly possible to use two different apis
alternately in the same thread, but using them
As soon as I can check through the pile of questions and issues that have
appeared while I was away. It is already in svn if you want to give it a
try.
On the other hand, if you are the Q (from STTNG) can you not just wave
your hand and make it happen? ;-)
Ray.
On Mon, Aug 10, 2009 at 11:51 PM, Q
See http://www.isri.unlv.edu/ISRI/OCRtk
The ^ before a character indicates that it is suspicious in some sense to
tesseract, and ~ indicates a reject. The output is in latin 1 instead of
utf-8, and may not work at all for non-latin text.
Ray.
On Mon, Aug 17, 2009 at 2:52 PM, jia
Works OK for me with 3.0 (apart from the problem that it puts all the units
in a separate column)
It might be an adaption error. It is hard to say without being able to
reproduce the problem and run it with the viewer.
Ray.
On Wed, Aug 12, 2009 at 2:11 AM, Alcareru sipulima...@yahoo.co.uk wrote:
TesseractExtractResult was written by OCRopus, and they only care about
single lines, so it has no way of telling the end of line.The text string is
already utf-8. It needs no further conversion. If you want access to the end
of line flag, the easiest way is to subclass TessBaseAPI and write a
I keep banning spammers. The number banned is now up to 22.
Ray.
On Sep 30, 2009 5:43 PM, Chen TsoLin tsolin.c...@gmail.com wrote:
Dears:
what happens with this mail..@@!!!
Administrator, please remove this kind of mail from mail groups~~thanks
2009/10/1 Inga M.
From the checkout page:# Non-members may check out a read-only working copy
anonymously over HTTP.
svn checkout
*http*://tesseract-ocr.googlecode.com/svn/trunk/tesseract-ocr-read-only
On Fri, Oct 9, 2009 at 4:00 PM, John a164666...@gmail.com wrote:
J. Garcia wrote:
Hi folks,
I tried to
1 - 100 of 126 matches
Mail list logo