post somewhere your files, so we can test it on linux...
Zdenko
On Thu, Apr 18, 2013 at 6:15 AM, Shree Devi Kumar shreesh...@gmail.comwrote:
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
says:
An alternative to multi-page tiffs is to create many single-page tiffs for
a
On Thu, Apr 18, 2013 at 5:35 AM, sdk shreesh...@gmail.com wrote:
Zdenko,
You wrote:
He can create another data and use it together with data provided by
google.
Does this mean that we can use the ability of tessearct to use multiple
languages for recognition to use multiple traineddata
On Wed, Apr 17, 2013 at 10:41 PM, Sven Pedersen sven.peder...@gmail.comwrote:
Rob,
You can add fonts to existing languages. Just follow the combine
instructions.
As far as I know, it is not possible. He can create another data and use it
together with data provided by google.
Sven
On
On Wed, Apr 17, 2013 at 10:36 PM, Robert Komar rko...@telus.net wrote:
On Wed, 17 Apr 2013, Sven Pedersen wrote:
This is covered in theFAQ:https://code.google.**
com/p/tesseract-ocr/wiki/FAQ#**How_https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_
On Fri, Apr 12, 2013 at 3:10 AM, u20...@gmail.com wrote:
Note that there still appears to be a problem with the bazaar example:
Even though the normal dictionary is supposed to be supressed and the user
wordlist used instead, the whole text in eurotext.tif is still returned,
including words
On Fri, Apr 5, 2013 at 11:20 PM, Ruud van Houtum ruudvhou...@gmail.comwrote:
Hello,
I am using Tesseract to output text files from scanned documents.
All text images contain typed text and are fairly clear/clean. So far
Tesseract has a pretty good accuracy and I am quite content.
However
folder exists)
From my experience it fails without the /.
Patrick
On Fri, Apr 5, 2013 at 6:07 PM, zdenko podobny zde...@gmail.com wrote:
I did not test the latest code but in past I have these experiences:
- if TESSDATA_PREFIX environment variable is specified than init path
Thanks for idea. I will try to have a look on it.
If anybody has patch ready I will welcome it warmly
Zdenko
On Tue, Apr 2, 2013 at 7:16 AM, Janusz S. Bień jsb...@mimuw.edu.plwrote:
The hOCR specification states that ocr_carea is content area which used
to be called ocr_column.
I've
I did not test the latest code but in past I have these experiences:
- if TESSDATA_PREFIX environment variable is specified than init path
was ignored
- if TESSDATA_PREFIX is build-in (default for autotools compilation)
than init path was ignored
The easy workaround for that problem:
On Thu, Apr 4, 2013 at 4:18 AM, Damiano Rodriguez damiano...@gmail.comwrote:
Hi all,
I have a very strange problem:
First of all: with visual studio 2010 and C# there was another problem
because i was build the project for the framework 4.0
I have changed it to 2.0.
Everithing is OK,
On Wed, Apr 3, 2013 at 8:04 PM, Renato Forti rtfo...@gmail.com wrote:
Hi,
How I shoud link tesseract so on my app?
My App crash with this:
release/parallel/ocr/doksafe_ocr_engine: symbol lookup error:
./behavior/tesseract/libocr_default_engine.so: undefined symbol :
On Wed, Apr 3, 2013 at 8:40 PM, Renato Forti rtfo...@gmail.com wrote:
Hi,
I am trying use tesseract on my app. I did link my app with:
OS: Linux (UBUNTU) gcc 4.6
tesseract
** **
from
ls /usr/local/lib/*tesseract*
/usr/local/lib/libtesseract.a
If you are no linux you can create PIX with pixReadMem.
If you are on Mac, Windows you will face problem (e.g. tif will works,
other formats like jpeg or png not).
See leptonica issue 77[1] for more details. There is also test case/example
file[2] where I used std::vectorchar to create PIX.
[1]
Hint: when you remove borders/boxes/table than tesseract does it job. So
you will need some tool for removing lines (maybe good start could be
line-removal leptonica[1]).
Or if you are able to detect region of each number, do ocr number by number
(with tesseract API or uzn files).
[1]
On Tue, Apr 2, 2013 at 7:28 AM, Matt Ball matt.bal...@gmail.com wrote:
Hello -- I get the following error when running mftraining:
$ mftraining -F font_properties -U unicharset -O eng.unicharset
eng.digital_dream.exp0.tr
Read shape table shapetable of 11 shapes
Reading
On Sun, Mar 31, 2013 at 10:55 PM, mike_ro...@hotmail.com wrote:
Try to turn off dictionaries (parameters load_system_dawg,
load_freq_dawg, maybe also load_punc_dawg, load_number_dawg,
load_unambig_dawg, load_bigram_dawg). You can do this only during init of
language.
I disabled all
On Sun, Mar 31, 2013 at 12:09 PM, Александр Жданов
alekzande...@yandex.ruwrote:
Hello
I have problems with characters recognition using tesseract.
So, I have got file with image of auto-detected car lisence plate. I have
created it using OpenCV and it looks like this:
Try to turn off dictionaries (parameters load_system_dawg, load_freq_dawg,
maybe also load_punc_dawg, load_number_dawg, load_unambig_dawg,
load_bigram_dawg). You can do this only during init of language.
Zdenko
On Fri, Mar 29, 2013 at 6:41 PM, mike_ro...@hotmail.com wrote:
Hello all.
I'm
Why are you using 3.01 instead of 3.02 (with installer) version??? there is
installer and using VS2010 instead of VS2010 should not be big issue (if
you are familiar with VS 2010)...
Zdenko
On Fri, Mar 29, 2013 at 5:51 AM, Buddhika De Seram w.dese...@gmail.comwrote:
Hi,
I'm new to
On Thu, Mar 28, 2013 at 1:56 AM, Nate Bennett nate.bennet...@gmail.comwrote:
I am running the tesseractdotnet wrapper build 590 found on
https://code.google.com/p/tesseractdotnet/. With the english3.01.
it is quite old try to find way how to use 3.02 version - vietocr[1] has
NET version maybe
Did you use environment setting TESSDATA_PREFIX ? If no, can you set it (to
C:\Program Files (x86)\Tesseract-OCR\)?
Zdenko
On Thu, Mar 21, 2013 at 2:08 AM, u20...@gmail.com wrote:
Thanks for the reply.
Yes, the file does exist, I can open it from my working directory using
On Wed, Mar 20, 2013 at 2:35 AM, u20...@gmail.com wrote:
I created the three files described in Section CONFIG FILES AND
AUGMENTING WITH USER DATA of *
http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html*http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html
On Sun, Mar 17, 2013 at 4:00 AM, Epix Zhang exzh...@gmail.com wrote:
Hello, I followed the instruction on
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3. But when
it came to the step Putting it all together, errors occured.
The command I run:
combine_tessdata.exe chi.
It
download
https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02.02.tar.gz
Documentation was prepared before releasing tesseract 3.02... I will fix it.
Zdenko
On Tue, Mar 12, 2013 at 3:43 PM, Vicky Patil timepassv...@gmail.com wrote:
Hi,
I followed instructions
There are no atttached data. Maybe try to use some online storage system
(google disk, skydrive, dropbox...) and send a link here.
You stated you are following wiki instruction[1], but you log shows it is
not true - you did not run mftraining.
[1]
On Fri, Mar 1, 2013 at 7:33 AM, Hadid Mubarak
hadidmubarak.jt...@gmail.comwrote:
hi..
I have same issue with you.. is anyone managed to solve them? please the
advice..
thank you so much..
Pada Senin, 11 Februari 2013 18:49:54 UTC+7, OCR explorer menulis:
hi...
I am using Serak tesseract
On Sun, Feb 24, 2013 at 12:20 AM, Nick White nick.wh...@durham.ac.ukwrote:
On Fri, Feb 22, 2013 at 03:20:49PM +, Nick White wrote:
On Sun, Jun 03, 2012 at 10:27:23PM +0100, zdenko podobny wrote:
it looks like it is ASCII only oriented (at least in report non-ASCII
are
malformed
On Thu, Feb 14, 2013 at 2:44 AM, iccol...@ncsu.edu wrote:
I'm having a similar problem. The .tif file can be found here(
https://www.dropbox.com/s/jivtydjj9gilkku/eng.ANSI_GDT.exp0.tif ) and I
hope anyone can help. I'm trying to train tesseract to understand
geometric tolerance symbols. I
As usual:
- try to reproduce problem with tessract executable if you use something
else (wrapper, in some cases API)
- sent input image
Zdenko
On Thu, Feb 14, 2013 at 5:13 PM, Markus Austin markus2k...@gmail.comwrote:
Hi All,
I currently have Tesseract implemented within a PERL
Thanks - I will include it in next wiki update.
I sent this cc to the tesseract forum, so other can enjoy this new tools
too.
Zdenko
On Mon, Feb 4, 2013 at 6:27 PM, Scott Stringham
scott.string...@gmail.comwrote:
I have posted a new tesseract box file editor online at
You don't need to edit it. Just run command as on wiki. If is faster than
editing tr file...
Zdenko
On Sun, Feb 3, 2013 at 12:21 AM, Carlos Antunes cf.antu...@gmail.comwrote:
Zdenko,
Shall I edit it and remove it before going further?
Thanks.
On Saturday, February 2, 2013 1:53:33 PM
Can you send and example of you tif file?
Zdenko
On Sun, Feb 3, 2013 at 10:08 PM, Michael Lissner
mliss...@michaeljaylissner.com wrote:
I have Ubuntu 12.04, which has tesseract 3.02 and leptonica version 1.69.
I've installed these, and also installed libtiff4 using apt-get.
When I try to
, Feb 3, 2013 at 1:16 PM, zdenko podobny zde...@gmail.com wrote:
Can you send and example of you tif file?
Zdenko
On Sun, Feb 3, 2013 at 10:08 PM, Michael Lissner
mliss...@michaeljaylissner.com wrote:
I have Ubuntu 12.04, which has tesseract 3.02 and leptonica version 1.69.
I've installed
://stackoverflow.com/questions/5083492/problem-with-tesseract-and-tiff-format
Zdenko
On Sun, Feb 3, 2013 at 11:00 PM, zdenko podobny zde...@gmail.com wrote:
Are you able to generate just one page or small example? Or can you
provide step how you create it (so I can create it)?
Tiff could be tricky. E.g
Don't sent gdb output - it is useless. Especially when you do not follow
wiki:
you run:
tesseract eng.20centsmarker.exp0.tif eng.20centsmarker.exp0.box
nobatch box.train
and you should run:
tesseract eng.20centsmarker.exp0.tif eng.20centsmarker.exp0 nobatch
box.train
Zdenko
On Fri,
But if you have a look at tr file, you will see that font name will
be 20centsmarker.exp0. And I guess this is not want you want.
Tesseract tooks some information from filenames. If you go your own way
with naming you will face a problem (crash). I remember there is crash at
some stage if last
Tesseract converts input image data to 2 colors mode (black white). So it
do no have information (at the output stage) about color of the input
symbols...
Zdenko
On Sun, Jan 27, 2013 at 10:52 PM, ipec...@gmail.com wrote:
Im new to the community but did some searching around before posting.
On Mon, Jan 28, 2013 at 12:01 PM, Nick White nick.wh...@durham.ac.ukwrote:
On Mon, Jan 28, 2013 at 11:57:41AM +0100, zdenko podobny wrote:
So try to read wiki ;-) and his e-mail. Indeed he already recognize his
problem:
Shall I also generate the shapetable as well. Well, I will try
Hi,
I remember there is (was) issue that mentioned this problem (B vs 8). So
maybe this is common problem (some O vs 0 or l vs. 1)... I you post
example image I can try to make some tests (later)...
Zdenko
On Mon, Jan 28, 2013 at 9:30 AM, jacob.chi...@gmail.com wrote:
I followed the thread
Post somewhere your files you try to combine.
Zdenko
On Fri, Jan 25, 2013 at 5:48 PM, Alp Oktem alpok...@gmail.com wrote:
*What steps will reproduce the problem?*
1. Prepared all necessary files for training
2. combine_tessdata ./lang.
*What is the expected output? What do you see
tesseract tries to open uzn files for defined[1] page segmentation modes
(from 4 to 10 and 0[2] or other way: it does not use them for modes that
request for automatic page segmentation).
In attachment you can find example of image + uzn file from
isri-ocr-evaluation-tools[3]. You can test it
1. I can not find box.nochop mentioned on wiki (but maybe I am too
tired at the moment ;-)). Can you provide link and paragraph?
2. Why are you using 2.x version? It is too old...
Zdenko
On Thu, Jan 17, 2013 at 5:21 PM, Firas almannaa firas.alman...@gmail.comwrote:
On Thursday, 28
Regarding cube:
- there are no more public information about cube than that 92 hits at
the forum I mentioned already (+ source code ;-))
- there are no information how to create cube data files (ok some of
them are text files...)
So you can:
1. try to use/train tesseract without
Really ;-)? I got 93 results. E.g.:
https://groups.google.com/forum/#!msg/tesseract-ocr/0msQtTB_XrI/D1noel9GpPgJ
https://groups.google.com/d/topic/tesseract-ocr/tyV5_z65XMk/discussion
https://groups.google.com/d/msg/tesseract-ocr/R7UCx0oV3PA/GE7KJ_76kS0J
Please honor time of people on this
Than it means that nobody create it. You are welcome to create it.
Zdenko
On Wed, Jan 16, 2013 at 5:49 PM, Sergey Kondratiev gorilovi...@gmail.comwrote:
Where i can download this component for delphi? i can't find it((
--
You received this message because you are subscribed to the Google
On Wed, Jan 16, 2013 at 3:34 PM, Sven Pedersen sven.peder...@gmail.comwrote:
The reason why Arabic has those files and your language does not is that
Arabic is set up to use the cube feature to combine it with other
languages, so you can do -l ara+eng and OCR a document with both Arabic
and
Hi,
first of all: Do not send executable to forum. Or do you think people here
have no excess to tesseract executable?
Next:
- What is purposed of using e.g. segmentation Treat the image as a
single word in a circle. (9) for your images?
- Your images looks like postprocessed me. Is it
12.04 as well as 12.10. If the answer is yes then I wanted to
install tesseract in ubuntu 12.10 for hands on experience - since I am
newbie to ubuntu.
With Regard,
-Sriranga
On Tue, Jan 15, 2013 at 1:23 PM, zdenko podobny zde...@gmail.com wrote:
Sriranga,
this has noting to do with Ubuntu
search archive of tesseract forums for cube.
Zdenko
On Tue, Jan 15, 2013 at 2:16 PM, gold snake huangjin...@gmail.com wrote:
My language some special, just like arab font, but bitween arab font have
some different, actually only different on shape of the font. and It's
writing right to left
On Mon, Jan 14, 2013 at 6:24 AM, Michael Young
michaelyoung1...@gmail.comwrote:
Extracting contours:
http://ayoungprogrammer.blogspot.ca/2013/01/equation-ocr-part-1-using-contours-to.html
Training Tesseract:
Sriranga,
this has noting to do with Ubuntu (or any other operating system). This is
related to user experience and understanding of error messages. This is not
problem.
Problem is that he did not try so solve this common (not tesseract related)
problem by himself. When I put to google checking
Hi,
I think you will need to run training for this. I tried simple c++ code
that show confidence values (see attachment) and for your digit 6 it
produced:
symbol 5, conf: 78.5236 5 conf: 78.523613
s conf: 77.376984
try to use page segmentation mode. E.g. Treat the image as a single word
(or text line or uniform block of text) will produce results.
As far as I remember discussion on this forum tesseract is not suitable for
handwritten text...
Zdenko
On Fri, Dec 28, 2012 at 11:55 PM, Nick Jalbert
On Tue, Dec 25, 2012 at 3:41 AM, Patrick Questembert
patrick.questemb...@gmail.com wrote:
The major languages such as English, French and Spanish come with a cube
version of the training data (e.g. eng.cube.*). So far we have used only
the regular training data (e.g. eng.traineddata). Can
On Thu, Dec 20, 2012 at 3:25 PM, Patrick Questembert
patrick.questemb...@gmail.com wrote:
Update: the Suzuki cook-book for building on iOS still works, see
https://tinsuke.wordpress.com/2011/11/01/how-to-compile-and-use-tesseract-3-01-on-ios-sdk-5/
About performance: we have observed only a
In case of tesseract - have a look at PageIterator and AnalyseLayout().
Example code you can at this forum archive[1].
[1] https://groups.google.com/d/msg/tesseract-ocr/25GQVGvEE2g/HCKmB7LOplkJ
Zdenko
On Tue, Dec 18, 2012 at 11:15 AM, José Luis Rey jluis...@gmail.com wrote:
Hello Friends,
On Tue, Dec 18, 2012 at 4:34 AM, Iris hongyujiei...@gmail.com wrote:
Hi, Zdenko.
The same problem happened with me.
It's weird to see syntax errors because I haven't modified anything.
I'm wondering if you have solved the problem?
I do not have this problem so there is nothing I can
What kind of OS you use, what version of tesseract etc...
I tried
tesseract original.jpg original -l chi_tra
and
tesseract preprocessed.tiff preprocessed -l chi_tra
and I did not get any error message (on openSUSE linux 64bit 12.2 with
tesseract 3.02.02)...
Why did you upscale image?
I do apologize, but I am not familiar with Chinese (or other Asian
languages ;-) ). So I tried
tesseract original.jpg original -l chi_sim
and the message was:
Too many unichars in ambiguity on line 0
Too many unichars in ambiguity on line 0
Tesseract Open Source OCR Engine v3.02.02 with
On Sun, Dec 16, 2012 at 9:01 AM, thomas nyan...@gmail.com wrote:
Dear All,
Is it possible to use tesseract for the new language?
If so , how can I start?
What about reading available docs in wiki?
Zdenko
--
You received this message because you are subscribed to the Google
Groups
On Fri, Dec 14, 2012 at 8:01 PM, Alexis ya...@antonakis.co.uk wrote:
I have a number of PDF files I am trying to OCRI have a script
which extracts each page into individual .tif files which I then run
through tesseract, and everything works fine
However I am trying to output these pages
If you run 'tesseract --help' (or just 'tesseract') you could see that
tesseract expects some structure of input arguments (e.g. one image file
and one output file), so usage of wildcards will not work (if wildcards
match to more than 1 file) neither on linux/unix nor windows (there is
different
On Wed, Dec 5, 2012 at 11:02 AM, mike oldfield czandrasze...@gmail.comwrote:
Tesseract-ocr has still problem with decoding of LED-like digits.
I made something like this in my squeeze comand line:
convert 1.jpg 1.tif tesseract 1.tif 1.txt nobatch digits
...but effects are very poor and far
Search forum. I remember discussion about similar topic.
AFAIR: tesseract has problem with letter(symbol) that consists of several
not connected parts (e.g. dots, lines) - solution should be to preprocess
image (blur).
Generally: black background is problem. Quality of image is too low (JPEG,
On Mon, Dec 3, 2012 at 9:19 AM, Benito2313 benito2...@hotmail.com wrote:
Op maandag 3 december 2012 08:48:20 UTC+1 schreef zdenop het volgende:
On Sun, Dec 2, 2012 at 11:56 AM, Nick White nick@durham.ac.ukwrote:
On Sun, Dec 02, 2012 at 01:29:54AM -0800, Benito2313 wrote:
Thank you
On Fri, Nov 30, 2012 at 10:10 PM, eljainc elja...@sbcglobal.net wrote:
My mistake, It was the 2.0.4 version.
I am still not sure where these English files should be. I have tried to
put them into a temp location using
ocr.Init(o:\\ocrtemp\\,eng,false);
I have also tried to put them in the
On Sun, Dec 2, 2012 at 11:56 AM, Nick White nick.wh...@durham.ac.uk wrote:
On Sun, Dec 02, 2012 at 01:29:54AM -0800, Benito2313 wrote:
Thank you for your reply, i cant fine the manual page of tesseract could
you post à link?
http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html
I guess there is problem to find deu.traineddata.
I would suggest to run your program in console, so you can see possible
error message (something like Error opening data file C:\Program
Files\Tesseract-OCR\tessdata/deu.traineddata).
Another option is to init tesseract and set variables in more
On Thu, Nov 29, 2012 at 10:04 PM, eljainc elja...@sbcglobal.net wrote:
Hello, I'm using the TessNet2 (.NET) library version 3.02 and I'm having
an issue in running my first ever program with Tesseract OCR.
Can you be please more specific, what do you mean with TessNet2 (.NET)
library version
It looks like you have problem with linking library. I modified code of
example
(tesseract-ocr-API-Example-vs2008.ziphttp://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-API-Example-vs2008.zip)
and code compiled ok for me (in VS 2008)...
--
Zdenko
On Wed, Nov 28, 2012
I put this code to
tesseract-ocr-API-Example-vs2008.ziphttp://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-API-Example-vs2008.zip
:
Pix *image;
char *outText;
char *configs[]={myconfig};
int configs_size = 1;
TessBaseAPI *tess = new TessBaseAPI();
, zdenko podobny zde...@gmail.com wrote:
I agree. But only project owners can do it ;-)
And as far as I tested it (on one project where I am owner) - he has to
delete them one by one :-)
--
Zdenko
On Wed, Nov 28, 2012 at 1:06 PM, Nick White nick.wh...@durham.ac.ukwrote:
Hi Tesseractors,
I
I agree. But only project owners can do it ;-)
And as far as I tested it (on one project where I am owner) - he has to
delete them one by one :-)
--
Zdenko
On Wed, Nov 28, 2012 at 1:06 PM, Nick White nick.wh...@durham.ac.uk wrote:
Hi Tesseractors,
I just posted an issue to remove all the
I do not have experience with vs2012 but I would suggest:
1. ask for help on Microsoft Developer Network - they should provide you
(at least general) instruction about this topic (maybe there is
compatibility option)
2. Try to go step by step with vs2012 build:
1. if you have
And what about installing Microsoft Visual C++ 2008 SP1 Redistributable
Package (x86)? In release notes[1] it is suggested for executables...
[1] *https://groups.google.com/d/topic/tesseract-ocr/EXyGqT9osrw/discussion*
*
*
On Tue, Nov 27, 2012 at 5:15 AM, Minjie Zheng zmin...@gmail.com wrote:
On Sun, Nov 25, 2012 at 1:57 PM, a314 ah.mas...@gmail.com wrote:
With a simple input .tif file (attached) that containts a very-readable
text Her old man will be jealous! in one line, my ouput text file shows:
Hidillfllflhfiilfl.
I spent quite a lot of time to build (windows 7, visual C++ 2010)
On Mon, Nov 19, 2012 at 5:57 PM, Linda Li codingpotatoli...@gmail.comwrote:
Thanks.
On Monday, November 19, 2012 10:30:07 AM UTC-6, zdenop wrote:
On Sun, Nov 18, 2012 at 11:49 PM, Linda Li codingpo...@gmail.com wrote:
Now building succeeds.
Compile has errors, complaining there are
If you are interested in all (648) tesseract-ocr 3.02 parameters(variables)
with default values have a look at
http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version
--
Zdenko
On Tue, Nov 20, 2012 at 2:10 PM, ArtooDetoo artoodeto...@gmail.com wrote:
:
On Sun, Nov 18, 2012 at 11:49 PM, Linda Li codingpotatoli...@gmail.comwrote:
I build it to run tessearctmain.cpp.
I do not intend to compile the whole
Thanks to instruction from Zdenko, I add the symbols as follows:
In Eclipse, Project Properties-C/C++ General-Paths and Symbols
Symbol,
On Mon, Nov 19, 2012 at 9:36 PM, Linda Li codingpotatoli...@gmail.comwrote:
Hah, I figured it out.
You are right, undefined does not mean undeclared. So I think I found
a one wrong lib.
I checked the Makefile, although a lot of strange words there, there are
LIBS = -llept -lpthread
So I
You should give definition to compiler. Have a look how it is done with
autotools (or VC++ solution if you are familiar with it).
I am not eclipse user (even I tried once to open and compile tesseract
project in it. It was smooth as far as I remember) but I would expect that
eclipse is able to
Hi all,
you will not get OSD (Orientation and script detection) output information
with tesseract executable. At the moment tesseract provide (save) only ocr
result. Somebody could consider help (tesseract --help) misleading because
it enumerate all possible page segmentation modes. I think that
QT Box Editor 1.10 was released. It is a multi-platform visual editor for
tesseract-ocr http://code.google.com/p/tesseract-ocr/ box
fileshttp://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
(used
for OCR training) based on QT4 library http://qt.nokia.com/products/.
Several problems were
On Fri, Nov 16, 2012 at 2:33 PM, José Luis Rey jluis...@gmail.com wrote:
Thanks for the response, I've read the Apache license 2.0 and all looks ok.
** **
The software I'm developing is for scanread documents like invoices ,
checks, and any document width fixed fields, linking zones
On Fri, Nov 16, 2012 at 3:05 AM, Linda Li codingpotatoli...@gmail.comwrote:
Version: tesseract 3.02.02
Ubuntu 12.04, Eclipse Juno
I am trying to use ViewerDebugging.
Following the instructions in
http://code.google.com/p/tesseract-ocr/wiki/ViewerDebugging
I installed javac
download
On Fri, Nov 16, 2012 at 3:03 AM, Linda Li codingpotatoli...@gmail.comwrote:
I want to build the tesseract 3.02.02 project so that I can modify some
code to tune it to some specific task.
Version: tesseract 3.02.02
Ubuntu 12.04, Eclipse Juno
I put the tesseract into the Eclipse project.
On Thu, Nov 15, 2012 at 10:15 AM, José Luis Rey jluis...@gmail.com wrote:
Thanks very much for your responses zdenop,
I'm not used to dev in open source projects like this, perhaps you may
help me to understand, for example if I implement a feature to add
character rectconfidence to the
On Fri, Nov 9, 2012 at 1:43 PM, Troy Frazier troypow...@gmail.com wrote:
Is it possible to search an image for a particular word using the Tessnet
wrapper? I see that it is possible to limit your scan to certain
characters, but what I would like to do is to input a word and have all
On Wed, Nov 14, 2012 at 3:38 PM, José Luis Rey jluis...@gmail.com wrote:
Opss I did not see this response,
For my working dirs are the dirs pointed by the env var : %TESSERACT_DATA%
I see that a need to compile to change the default config/dictionary to
the correct windows vista/7 %AppData%
On Wed, Nov 14, 2012 at 4:12 PM, José Luis Rey jluis...@gmail.com wrote:
You do not need to compile tesseract. Just set your TESSERACT_DATA to your
tessdata directory (e.g. %AppData%) (before **calling tesseract)...
That's all.
Imaging that you are running on vista/7 as a regular user, you
Are you able to program (C++)?
--
Zdenko
On Tue, Nov 13, 2012 at 6:17 AM, Walid Khedr khe...@gmail.com wrote:
Hi,
I'm new in tesseract. I just want to use it for Character Segmentation.
The input is an image of a text string and the output will be an array of
*images *for each character.
You need to get box coordinates (BoundingBox) for each symbol[1]. Try to
follow hocr algorithm within tesseract[2]. hocr is focusing on word/line
but the logic would be the same for symbols (and it could be simplified).
Or maybe search for character confidence in issues and forum. There
should
On Mon, Nov 12, 2012 at 3:23 PM, Mi Tran nuon...@gmail.com wrote:
What kind of your bmpFile? bmpFile must is *.tif
This is not true - it can be any image type supported by leptonica.
--
Zdenko
--
You received this message because you are subscribed to the Google
Groups tesseract-ocr
check also error messages - if you did not run shapeclustering then
mftraining should not produce any output (in 3.02 version) ;-) Also it
looks like you forget to rename output files from training tools!
You need to follow training wiki[1]!
[1]
As far as I know this list is moderated e.g. your first post should be
approved by moderator... But I am not familiar with details (I am not
moderator ;-) )
--
Zdenko
On Mon, Nov 12, 2012 at 8:27 PM, Random Terrain
replayabil...@randomterrain.com wrote:
Does it take a while to show up or is
If you are serious about your training project, please invest your time to
read wiki (once again if necessary). It is there.
--
Zdenko
On Tue, Nov 13, 2012 at 1:20 AM, Mi Tran nuon...@gmail.com wrote:
Thanks zdenop , I have ran shapeclustering and read training wiki. But it
still has error.
Leptonica is library that handle images for tesseract.
--
Zdenko
On Tue, Nov 13, 2012 at 1:36 AM, MiT nuon...@gmail.com wrote:
leptonica is tool support training?
Vào 22:38:58 UTC+7 Thứ hai, ngày 12 tháng mười một năm 2012, zdenop đã
viết:
On Mon, Nov 12, 2012 at 3:23 PM, Mi Tran
On Wed, Nov 7, 2012 at 11:25 PM, Donaldo donaldo@gmail.com wrote:
Hi, Judith
I tried using the Esperanto option with the Tesseract package distributed
in Ubuntu 12.04 but it did not recognise any of the accented Esperanto
letters. I have done some training on Esperanto texts and
On Thu, Nov 8, 2012 at 2:00 PM, Tom Mc thomasmccot...@gmail.com wrote:
Hi All,
I have many documents that contain a mixture of two or more languages;
Chinese and English for example. Is there a way to merge together two
training files so the engine can interpret both character sets?
Any
try the 3.02 - I tested it with VS2008 and mingw32 on Windows XP and there
was no crash.
--
Zdenko
On Thu, Nov 8, 2012 at 10:54 PM, Mike Butterbrodt the.mik...@gmail.comwrote:
I have an image snippet that causes an assertion in unichar_id() from a
higher call to GetUTF8Text(RIL_SYMBOL)
1001 - 1100 of 1370 matches
Mail list logo