Please read wiki regarding improving tesseract result.
Zdenko
pi 6. 7. 2018 o 10:52 napísal(a):
> Hi,
>
> I was using tesseract from long time and its working fine, we got some
> new images but these images are not been parsed by tesseract
>
> I removed extra noise, changed to greyscale,
this is not tesseract problem:
https://ask.libreoffice.org/en/question/97993/why-doesnt-lo-writer-open-and-save-text-documents-encoded-in-utf-8-without-bom-any-plans-to-fix-this-soon/
tesseract output is UTF-8 encoded.
Zdenko
pi 29. 6. 2018 o 19:37 Martin Jenniges
napísal(a):
> Hello,
>
>
can you post a image? It seems like leptonica/tiff problem
Zdenko
ut 26. 6. 2018 o 7:21 James Worldprogram
napísal(a):
> *Problem*: - I am using Tesseract: *tesseract-ocr-setup-3.05.01.exe* as a
> command line argument in Windows OS program with the argument *-l eng*;
> it is working
Yes, it is ok, but you do not have to create separate issue for PR (PR is a
issue too)
Zdenko
ut 5. 6. 2018 o 16:52 Paul Kitchen
napísal(a):
> ZDenko,
>
> I'm new to this so hopefully I did everything correctly. Here is the issue
> I created:
>
>
You need to fork official repository and then you have all permission you
need. When you make your changes you can send pull request to official
repository with your changes.
Zdenko
ut 5. 6. 2018 o 15:06 Paul Kitchen
napísal(a):
> ZDenko,
>
> Unfortunately I don't seem to have write
Please make PR for master (4.0) branch and I will cherry-pick for 3.05...
Zdenko
ut 5. 6. 2018 o 4:38 Paul Kitchen
napísal(a):
> ZDenko,
>
> I checked out the latest tesseract code and updated to branch 3.05. I see
> that the int64_t area bug is already fixed (thanks!). I also see that the
>
Stefan,
Paul suggest to modified also LoadDataFromFile (ccutil/genericvector.h).
That modification is not needed?
Zdenko
po 4. 6. 2018 o 17:32 'Stefan Weil' via tesseract-ocr <
tesseract-ocr@googlegroups.com> napísal(a):
> As far as I see 4.0.0 is good. I have sent a pull request which
, _data[0], boxes, texts
> ,
> box_texts, pages);
> }
>
>
>
> On Saturday, June 2, 2018 at 2:22:16 AM UTC-6, zdenop wrote:
>>
>> Please check if this is ok now. If yes, I am willing to make 3.05.02
>> release ;-)
>>
>> Zden
Please check if this is ok now. If yes, I am willing to make 3.05.02
release ;-)
Zdenko
so 2. 6. 2018 o 10:16 Zdenko Podobny napísal(a):
> done in
> https://github.com/tesseract-ocr/tesseract/commit/bc5dfc4b953babcc865f68a55c3bf415f4280b1a
> Zdenko
>
>
> št 31. 5. 2018 o 22
done in
https://github.com/tesseract-ocr/tesseract/commit/bc5dfc4b953babcc865f68a55c3bf415f4280b1a
Zdenko
št 31. 5. 2018 o 22:39 shree napísal(a):
> This has been an issue for long. Thanks for finding the problem.
>
> Please submit a PR on github.
>
> On Friday, June 1, 2018 at 1:55:25 AM
Did you follow instruction for installation of that package?
Did you try internet search before posting on forum?
Did you try to search for help in project tesserocr???
I just put it to google and I got:
https://pypi.org/project/tesserocr/
https://github.com/sirfz/tesserocr
Did you read wiki before posting? E.g.
https://github.com/tesseract-ocr/tesseract/wiki/APIExample#getcomponentimages-example
Zdenko
ne 20. 5. 2018 o 8:00 nick napísal(a):
> hi
>
> is there a way to extract the location of each components (lines) in the
> image ?
>
> for
Hello all,
if you are interesting in downloading only some language of traineddata
from repositories (or different tagged version) have a look at
tessdata_downloader[1] .
I just released version 1.0 [2] . I created this script in python, but also
I was able to create windows 64bit "frozen" app
The error message is clear. Or?
Zdenko
pi 4. 5. 2018 o 20:38 Dattatraya Tembare
napísal(a):
> Exception in thread "main" java.lang.Error: Invalid memory access
> at com.sun.jna.Native.invokePointer(Native Method)
> at
MS word ;-)
1. rename test.hoct to test.hocr.html
2. open test.hocr.html in real text editor (e.g. notepad++) and delete
lines 2 and 3 otherwise word will produce error message
3. open test.hocr.html in word.
Zdenko
št 3. 5. 2018 o 1:42 abdu napísal(a):
>
If you have experience your help will be warmly welcomed.
OpenCL is not maintained and it is on good way to be removed if
maintainer/contributor will not be found.
Anyway it is not used extensively, so there is a place for improvement,
Zdenko
pi 27. 4. 2018 o 10:21 Janpieter Sollie
Why are you building project from source if you have no clue what you do?
Based on your other post: you decided to build leptonica without support of
common image formats.
Dňa št 26. 4. 2018, 7:01 Rolf Schumacher
napísal(a):
> I just installed from git repository
>
>
We are making reorganization of tesseract.
Using the latest code is not recommended at all especially if you do not
follow developers communications.
Zdenko
2018-04-25 19:59 GMT+02:00 Marius Amado-Alves :
> Trying to install on a Mac, cannot pass the autogen.sh step.
Well, you should contact creator of traineddata . We have no clue what they
did..
Zdenko
2018-04-25 14:55 GMT+02:00 :
> Hello there,
>
> i don't know what to do anymore...
> I want to use tesseract-ocr 3.05 for scanning documents, using the font
> "Perfect DOS VGA 437
Time for upgrade?
Zdenko
2018-04-21 22:14 GMT+02:00 'DR' via tesseract-ocr <
tesseract-ocr@googlegroups.com>:
> I'm using:
>
> tesseract 3.04.01
> leptonica-1.73
> libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 :
> libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2
You can start with using the latest version and providing details...
Zdenko
2018-04-18 7:56 GMT+02:00 Kai Feng :
> ./.libs/libtesseract.so: undefined reference to `omp_get_thread_num'
> ./.libs/libtesseract.so: undefined reference to `GOMP_sections_end_nowait'
>
You should download the source and build and install it with cppan + cmake.
See
https://github.com/tesseract-ocr/tesseract/wiki/Compiling#develop-tesseract
Zdenko
2018-04-11 4:21 GMT+02:00 :
> i have been using tesseract 3.04 i could use it just by adding the include
If you followed someone tutorial you should complain to its author ;-).
I am not familiar with Mac, but on linux you can do it (in command) this
way:
export TESSDATA_PREFIX=/usr/loca/share/
Maybe it is similar on Mac. Try to google how to set environment variable
on Mac.
Zdenko
2018-04-10
First of all: your command if wrong. It should be constructed this way:
tesseract image output [options]
See tesseract --help for more details.
Next: error message is clear:
Error opening data file ./tessdata/Fraktur.traineddata
You (or your installation) instructed to look for trainneddata
aim is to have tool that is easy portable with minimum dependencies.
IMO it is standard on linux/unix like system to use --help option for
explanation of usage.
Zdenko
2018-04-02 14:38 GMT+02:00 JP T :
> Well, the problem is error handling.
> If tesseract would have
... and it was exactly the same in tesseract 3.0x as in 4.0
Zdenko
2018-04-02 0:14 GMT+02:00 JP T :
> Solved:
> must be* tesseract infile outfile options* instead of standard unix *program
> options infile outfile*.
> On Sun 1 Apr, 2018, 7:25 PM JP T,
If you are really insterested in help than provide full information/command
how you run tesseract.
Zdenko
2018-03-31 20:19 GMT+02:00 JP T :
> Hi
>
> I just updated from version 3.04.01 but now tesseract fails with above
> message if I give the -psm option.
> input files
tesseract can produce output in txt, pdf and hocr (html).
Tesseract focus is to provide ocr engine and not complex document output
like docx or ods.
Zdenko
2018-03-22 7:47 GMT+01:00 :
> Can I use tesseract in Ubuntu to get .docx or .doc output(word format).
>
>
Why you specify compiler (especially if it can not be found)?
Zdenko
2018-03-17 19:07 GMT+01:00 Richard McAlexander :
> Thanks. Anyone know how I can fix that? I have gcc/Xcode installed, not
> sure why its not finding the command.
>
> On Saturday, March 17, 2018
you specified that c++ compiler is: g++-6
and your system reports:
g++-6: command not found
Zdenko
2018-03-17 11:57 GMT+01:00 Richard McAlexander :
> I'm having trouble compiling Tesseract 4.0. I have all dependencies
> installed. The error occurs when after I
here are details:
https://github.com/tesseract-ocr/tesseract/commits/master
Zdenko
2018-03-16 12:37 GMT+01:00 이경준 :
> Hi ~
>
> What is differnt tesseracr 4.00(alpha) from tesseract4.00(Beta) in details
> ?
>
> Thank You
>
> --
> You received this message because you are
it is official:
https://github.com/tesseract-ocr/tesseract/releases
Zdenko
2018-03-12 10:09 GMT+01:00 adarsh shukla :
> There is no official release of tesseract 4.0 Beta. There might be some
> unofficial release, not found anything as such in Google.
>
> On Monday,
https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
Zdenko
2018-02-25 11:38 GMT+01:00 Dusayanta Prasad :
> I am try to convert the below image using Tesseract in linux using the
> following command:
>
> tesseract img.jpg out -l eng
>
>
>
https://github.com/tesseract-ocr/tesseract/blob/master/INSTALL.GIT.md
Zdenko
2018-02-23 21:53 GMT+01:00 :
> I don't see Makefile in the master branch of tesseract-ocr/tesseract, Is
> there a way for me to get it from other branches? I needed to install
>
pdf (mm.pdf) is not image file. It is document file.
Tesseract accept as input only image files (tiff, png, jpeg... based on
your leptonica build)
Zdenko
On Thu, Mar 24, 2016 at 5:27 PM, Jacob Stoker wrote:
> Hello tesseract-ocr world!
>
> So I'm running tesseract. I have
I am not familiar with tess-two, but I see there function
getChoicesAndConfidence[1].
[1]
https://github.com/rmtheis/tess-two/blob/master/tess-two/src/com/googlecode/tesseract/android/ResultIterator.java#L80
Zdenko
On Mon, Mar 14, 2016 at 7:12 PM, Sergio Mendoza
wrote:
have a look at tesseract::ChoiceIterator.
See
https://github.com/tesseract-ocr/tesseract/wiki/APIExample#example-of-iterator-over-the-classifier-choices-for-a-single-symbol
Zdenko
On Mon, Mar 14, 2016 at 6:17 AM, Sergio Mendoza
wrote:
> So I've been trying to use
you have very old version of tesseract.
page_separator was implemented after 3.02 release
Zdenko
On Sat, Mar 12, 2016 at 10:22 PM, wrote:
> Thanks Zdenko. I'm still stuck. I OCR'd an 81 page tiff file and I've
> searched my output txt file for the form feed character (asc
Default page separator is the form feed control character.
You can modify it with parameter page_separator.
Zdenko
On Sat, Mar 12, 2016 at 7:21 PM, wrote:
> If I OCR a multipage tiff file using Tesseract it comes out as one single
> page .txt file. Is there a way to
What page segmentation method[1] you used?
[1]
https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#page-segmentation-method
Zdenko
On Wed, Mar 9, 2016 at 5:14 PM, 'John Taves' via tesseract-ocr <
tesseract-ocr@googlegroups.com> wrote:
> I am trying to recognize a flawless image. I
IMO it is - in hocr (xml) output or tsv (in master branch a.k.a 3.05)
Zdenko
On Tue, Mar 8, 2016 at 3:14 PM, Age Bosma wrote:
> Hi Teng,
>
> The options I mention aren't available in tesseract. I listed them as
> suggestions for extending tesseract. They haven't been
Can you please make issue in tessdata part of project[1] and provide
(simple) test image?
Thanks,
[1] https://github.com/tesseract-ocr/tessdata/issues
Zdenko
On Sun, Mar 6, 2016 at 1:00 PM, Bojan Djuric wrote:
> In language file spr_latn.tessdata (Serbian lating) there is
This is strange (not standard output). Can you provide more details?
Zdenko
On Wed, Feb 24, 2016 at 10:21 PM, Monika Arora
wrote:
> Hello,
>
> My error logs are getting filled up with the following error messages,
>
> Tesseract Open Source OCR Engine v3.03 with
try to use "C:\\tesseract-ocr-3.02\\" in init...
Zdenko
On Sat, Jan 23, 2016 at 6:54 PM, Rajil Yadav
wrote:
> Hi,
>
> I am trying to do winth win8, s 2012, no environment variable is set.
>
>
> api = new tesseract::TessBaseAPI();
> int i =
>
When you ask for support please provide example files.
Did you try the latest version of tesseract?
Zdenko
On Sun, Dec 27, 2015 at 9:43 PM, bácsi Kazi wrote:
> Could you help? Have I missed something blatantly trivial?
> Any help would be highly appreciated!
>
> Kazi
>
>
First of all - there is no such policy as not providing Windows
installers. There is no installer because there is nobody who would
maintain it and provide solution (e.g. NSIS destroys my PATH variable on
windows ;-) ). Everybody is busy with programming :-) (something else).
Next: there is
https://github.com/tesseract-ocr/tesseract/releases
Zdenko
On Tue, Nov 17, 2015 at 8:02 PM, Rich Taylor wrote:
> I see that the code currently says "3.05".
>
> So... was there an official 3.04 release? Is there a snapshot of the code
> from that point? If not, what is
If you want to use tesseract you need to have tesseract library. And
tesseract need leptonica[1]
Error message is clear - compiler can not find these libraries - you need
to adjust path in project file according location of that libraries.
[1]
have a look at qt-box-editor[1] (even I did not test it with for a long
time, but there should be support for QT5)
[1] https://github.com/zdenop/qt-box-editor
Zdenko
On Mon, Nov 2, 2015 at 5:44 AM, Liu Paulson wrote:
> I download the tesseract-ocr library from the
>
How you tried to build it? Wich revision?
What kind of error you got? What is your OS?
Zdenko
On Mon, Oct 26, 2015 at 5:35 AM, supriya Das wrote:
> Hello kaushal
>Thanks for getting response from you. I have tried to build to latest
> version of tesseract code from
Can you place it somewhere (e.g. for further improving) on internet so we
can link it for wiki?
Zdenko
On Sun, Oct 25, 2015 at 9:49 PM, Pierre-Luc Pineault <
pierrelucpinea...@gmail.com> wrote:
> Hello
>
> I just created the OCR A STD font for Tesseract. I thought this might be a
> good idea to
I added it to addons wiki[1].
[1]
https://github.com/tesseract-ocr/tesseract/wiki/AddOns#community-training-projects
Zdenko
On Mon, Oct 26, 2015 at 3:32 PM, Pierre-Luc Pineault <
pierrelucpinea...@gmail.com> wrote:
> Thank you!
>
> I've create the repository here :
>
Why there must be default config file???
Default values are defult because there are alredy set. config files just
modified them.
Zdenko
On Tue, Oct 20, 2015 at 11:24 AM, Mayu Shukla wrote:
> hello Tom,
>
> I got your point.
> My question was more on general terms,i mean
-- Forwarded message --
From: Juan Pablo Aveggio
To: tesseract-ocr
Cc:
Date: Mon, 21 Sep 2015 16:17:45 -0700 (PDT)
Subject: Train tesseract 3.04 for recognition of six patterns no existents
in UTF-8
Hello
I'm trying to train
First of all - if you need help, provide original image for investigation.
Next I do not understand why you try compile old code (3.01) from SVN. It
does not make sense - we switched to git and github.com, there were a lot
of bugfixes related to different platform including Mac. If you want to
Or other way around: TessTextRendererCreate is tesseract 3.04 (C-API)
function, so you need to upgrade your tesseract library.
Zdenko
You'd need Tess4J 1.5, which is compatible with Tess 3.02.
On Friday, September 4, 2015 at 5:59:00 AM UTC-5, Hang Wen wrote:
>
> Hi,
>
> I got the following error
On Thu, Sep 3, 2015 at 9:41 AM, Jozef M. wrote:
> Dear all,
>
> you can use the following web app to inspect some of the internals of
> traineddata files:
> https://te-traineddata-ui.herokuapp.com
>
> Few notes:
> - this version does not parse cube specifics and some of the
tessedit_ocr_engine_mode is init-only[1] parameter (INT_INIT_MEMBER [2])
e.g. you can set it only during initialization of tesseract. Otherwise it
has no effect.
[1] https://github.com/tesseract-ocr/tesseract/wiki/ControlParams#init-only
[2]
On Wed, Sep 2, 2015 at 5:16 PM, wrote:
> Hi Thanks for the notes! The problem was that I was used to using
> tesseract on mac, which has a different install process (I believe) from
> ubuntu. I needed to compile it on ubuntu, which I think is possibly
> equivalent to
Did you installed tesseract dev package?
Zdenko
On Sun, Aug 30, 2015 at 8:07 PM, wrote:
>
> Tesseract:
> tesseract-ocr: Installed: 3.03.02-3
>
>
> Ubuntu:
>
> Ubuntu 14.04.3 LTS
>
> Also, just to make sure I'm not missing something, is there a distinction
> between
On Mon, Aug 17, 2015 at 6:07 AM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Ray was looking for comparative feedback regarding the new traineddata for
RTL languages, so this will be useful.
As far as I know, Google Docs does not use tesseract OCR engine for
recognizing the text.
Zdenko
On Thu, Aug 13, 2015 at 4:59 PM, Anshul Maheshwari anshul.ffm...@gmail.com
wrote:
Hello,
Are there any pre-built library for windows of 3.04 tesseract.
There are only cygwin based[1]. Visual Studio 2012 x86 and x64 can be found
at C# tesseract project[2], but it looks like there are 3
Quick reply ;-): have a look at TessBaseAPIGetComponentImages. There is
python example[1] for C-API, so you could be able to follow if you are
familiar with tesseract C-API.
Just change
tesseract.TessBaseAPISetPageSegMode(api, PSM_AUTO_OSD)
to
tesseract.TessBaseAPISetPageSegMode(api,
you wrote:
but still output is showing not all lines in input image
and
I have attached 3 files which are not detected properly
but you sent one png file with word We've... How many lines do you expect
in it ;-) ?
Zdenko
On Wed, Aug 12, 2015 at 8:56 AM, Anshul Maheshwari
@ http://bhajans.ramparivar.com
On Fri, Jul 24, 2015 at 12:41 AM, zdenko podobny zde...@gmail.com wrote:
1. Well if someone compile code from git (s)he should know what
revision is using ;-) And of course git code (unreleased) should not be
distributed.
2. Current git code (should
my windows builds in debug mode then?
greetings,
simon
Am 23.07.2015 um 21:11 schrieb zdenko podobny:
1. Well if someone compile code from git (s)he should know what
revision
is using ;-) And of course git code (unreleased) should not be
distributed.
2. Current git code
1. Well if someone compile code from git (s)he should know what revision
is using ;-) And of course git code (unreleased) should not be distributed.
2. Current git code (should) shows version number 3.05.00dev
or 3.04.01dev based on (main) branches. And once again compiled program
it can use at the moment.
Let's see if it works.
had no time currently to test but will do in the office tomorrow.
greetings,
simon
Am 20.07.2015 um 20:38 schrieb zdenko podobny:
should be fixed - pull updates from git repo...
Zdenko
On Mon, Jul 20, 2015 at 6:13 PM, Simon
from my phone. excuse the brevity
On 21 Jul 2015 11:14, zdenko podobny zde...@gmail.com wrote:
I do not understand - this is standard fix/commit as any other. Why it
should be tagged???
Zdenko
On Tue, Jul 21, 2015 at 6:44 AM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Zdenko,
How
the brevity.
On 21 Jul 2015 00:09, zdenko podobny zde...@gmail.com wrote:
should be fixed - pull updates from git repo...
Zdenko
On Mon, Jul 20, 2015 at 6:13 PM, Simon Eigeldinger
simon.eigeldin...@vol.at wrote:
Hi all,
just tried to compile tesseract on windows using cygwin from
? Or
migrate to linux box
On Sat, Jul 18, 2015, 9:32 PM zdenko podobny zde...@gmail.com wrote:
Last official leptonica windows release is 1.68.
So if you build leptonica by yourself, you should change modify tesseract
solution based on your leptonica compilation...
Zdenko
On Sat, Jul 18
Last official leptonica windows release is 1.68.
So if you build leptonica by yourself, you should change modify tesseract
solution based on your leptonica compilation...
Zdenko
On Sat, Jul 18, 2015 at 8:43 AM, Kasi Selvam kasiselv...@gmail.com wrote:
On Monday, 29 June 2015 13:16:53
Can you please clarify what it is SVN 3.02 version???
Zdenko
On Tue, Jun 2, 2015 at 5:50 PM, Jeremias Schucker
jeremias.schuc...@gmail.com wrote:
Hello everyone,
I was trying to compile SVN 3.02 version with visual C++ 2008 on Win8.1
and got the following error:
warning C4251:
Ad 1. This file was generated by Google in their internal system. The tools
are opensourced (see 3Training.pdf[1] - but I would suggest you to read all
presentations) now (or ported, so they use free libraries instead of
google internal libraries). Regarding used fonts I guess that
file
OpenCL support is experimental or it fails for some images (see issues
marked with OpenCL). I asked team that prepared this code for review and
fix, but I can not say when it will be finished.
Zdenko
On Fri, May 8, 2015 at 10:08 AM, Mohammad Umar mohammaduma...@gmail.com
wrote:
Hi,
Any
This is leptonica issue ;-)
Leptonica use stub/dummy file for not presented support[1]. So message Error
in pixReadStreamTiff: function not present you should read as:you did not
compiled leptonica with (lib)tiff support you want to use it for reading
tiff file...
[1]
IMO there are 2 easy solutions:
1. You can combine input images with ImageMagick to multipage tif (e.g.
convert image1.png image2.bmp image3.tif output.tif)
2. You can create text file with image filename per line. e.g. filelist.lst
that has this context:
song.png
tessinput.tif
superscript.png
, zdenko podobny zde...@gmail.com wrote:
Can you create a repository for your training (in sourceforge or github)?
Maybe with detailed description how you created it (so potentially other
people can try to improve/extend it).
Zdenko
Zdenko
On Fri, Apr 3, 2015 at 5:04 AM, Derek Dohler doh
Have a look at Text Fairy (OCR)[1].
I have a good experience with it (I use it for extracting text from books
for quotation e.g. just few lines).
Code is availabe on Github[2] (I am not sure if it is up-to-date).
[1] https://play.google.com/store/apps/details?id=com.renard.ocr
[2]
What version of gcc is there?
Maybe have a look at this solution on stackoverflow[1]
[1]
http://stackoverflow.com/questions/8640689/gcc-4-1-2-error-integer-constant-is-too-large-for-long-type
Zdenko
On Tue, Feb 17, 2015 at 7:05 PM, Markijan Blaschtschak mblas...@gmail.com
wrote:
Hi all,
I
On Wed, Jan 28, 2015 at 5:26 PM, juan peralta peralta11...@gmail.com
wrote:
El sábado, 24 de noviembre de 2012, 19:30:25 (UTC-6), Minjie Zheng
escribió:
Okay, I have spent literally two days trying to get tesseract to work in
VS2012. No matter what I do, my programs would not execute. It
From my experience dictionary has only limited effect on OCR result: e.g.
adding word to dictionary does not mean that tesseract will recognize it.
But on other side missing word in dictionary does not mean that tesseract
will not recognize it correctly. So if you have just ascii text (without
On Fri, Oct 24, 2014 at 1:45 AM, Ryan Dev software.developer.r...@gmail.com
wrote:
Hi, I have what I think is a unique situation, and I was hoping I could
get some hints on how to proceed.
I have problem font files, for which I want to fix the unicode mappings
for. I also have PDF files
OCR a test image with you app, store result to text file. Than OCR the same
image with tesseract executable (output should be in text file by default)
and compare results.
If output from tesseract executable is OK, but from your app is wrong (e.g.
there are only ascii letters) = you have problem
can you post somewhere 18.jpg?
Zdenko
On Wed, Oct 15, 2014 at 3:46 AM, Chris Cameron ch...@upnix.com wrote:
This command:
$ tesseract.exe 18.jpg test
Gives me test.txt, which has all the text from 18.jpg, as expected.
This command:
$ tesseract.exe 18.jpg test pdf
Gives me test.pdf,
Just a hint: there is a fork that tries to output HOCR details in a TSV
format file https://code.google.com/r/email-hocr-tsv/[1].
I did not test it :-), so I have not clue if it fits to the original
request...
[1] https://code.google.com/r/email-hocr-tsv/source/list
Zdenko
On Tue, Oct 14, 2014
post somewhere your input and output files
Zdenko
On Thu, Oct 2, 2014 at 2:03 PM, simon.eigeldin...@vol.at wrote:
hi all,
i compiled tesseract from git yesterday and played with it a little bit.
pretty impressive what happened since around 2 years.
not only has tesseract a lower filesize
After SetImage call (tesseract api) function GetThresholdedImage and save
result (leptonica PIX) to disk.
If the output is wrong you did not set image data to tesseract correctly
(e.g. you need to change SetImage parameters).
Zdenko
On Wed, Oct 1, 2014 at 7:30 AM, Umresh umr...@myingage.com
On Fri, Sep 26, 2014 at 2:31 PM, Surajit das smartsurajit2...@gmail.com
wrote:
I want to learn how to integrate tesseract with opencv in windows. I am
really really confused about how to use tesseract and i have read a lot of
articles and link but i couldnt understand. So, please can anyone
On 26 Sep 2014 20:22, Nicolas Nickisch n.nickisc...@gmail.com wrote:
I compiled and installed tesseract 3.03 manually.
tesseract --listlangs gives me an error message:
error opening data file /usr/local/share/tessdata/eng.traineddata
INdeed, I don't have eng.traineddata in that directory,
This is known issue - try current code from git repository. It should be
fixed.
Zdenko
On Fri, Sep 19, 2014 at 2:38 PM, Frank Siegert frank.sieg...@googlemail.com
wrote:
Dear all,
I have been testing tesseract to embed OCR in scanned PDF documents, and
it works phenomenally well in
Well yes and no ;-)
Yes - there should be no change on image, but no - you need to expect
that (re)compression of input image by pdf renderer could take a place. See
comments for issue 1285[1] for more details.
[1] https://code.google.com/p/tesseract-ocr/issues/detail?id=1285
Zdenko
On Fri, Sep
There is no tesseract 3.04 - so you can not install it.
Your question indicates that you do not understand consequences of your
action, so I strongly suggest you to revert to last stable release which is
3.02.02.
Zdenko
On Fri, Sep 19, 2014 at 8:31 PM, Rick Leir rich...@c7a.ca wrote:
Ubuntu
At the moment tesseract executable allows only one output (per run).
It is a trivial change to allow multiple outputs
Zdenko
On Wed, Sep 17, 2014 at 4:08 AM, Shree Devi Kumar shreesh...@gmail.com
wrote:
Quan,
Can it also be done in commandline version?
Shree
Shree Devi Kumar
What kind of solution you expect for wrong command?
Zdenko
On Wed, Sep 10, 2014 at 4:42 PM, Dovhani Foneworx dfone...@gmail.com
wrote:
Any solution for this?
fonew...@foneworxtest.foneworx.co.za:~/DM/DEVPLACE/BOXEDIT$ tesseract
email.tif num10.tif num11.tif num12.tif num13.tif num13.tif
Fraktur repository is here:
https://github.com/paalberti/tesseract-dan-fraktur.git
IMO you should post your files there.
Zdenko
On Thu, Aug 28, 2014 at 3:38 PM, matthew christy matt.chri...@gmail.com
wrote:
Ah, OK. Thanks Zdenko. Is there any organize alternative for posting
Tesseract
1. Last version of stable tesseract release is 3.02
2. There is no official 3.03 release. There is only 3.03 release candidate
that is intended for developer for testing.
Even 3.03 is for developer its building on linux should be quite smooth
process (maybe it depend on experiences with building
Anybody who is packaging tesseract and publicaly sharing 3.03 (excluding
-rc1) and 3.04 is lying. There are no such releases.
Repository is intended for developers and testers not for packagers! And it
is absolutely normal that there are changes of version withing repository.
There are for
to get tesseract 3.03 rc1 ?
On Wed, Aug 27, 2014 at 2:40 PM, zdenko podobny zde...@gmail.com wrote:
Anybody who is packaging tesseract and publicaly sharing 3.03 (excluding
-rc1) and 3.04 is lying. There are no such releases.
Repository is intended for developers and testers not for packagers
See:
http://google-opensource.blogspot.sk/2013/05/a-change-to-google-code-download-service.html
Zdenko
On Wed, Aug 27, 2014 at 6:22 PM, matthew christy matt.chri...@gmail.com
wrote:
Hi all,
I've created some new English-language Fraktur training by utilizing the
current and excellent
701 - 800 of 1368 matches
Mail list logo