Re: [tesseract-ocr] Not Able to get Text

2018-07-06 Thread Zdenko Podobny
Please read wiki regarding improving tesseract result. Zdenko pi 6. 7. 2018 o 10:52 napísal(a): > Hi, > > I was using tesseract from long time and its working fine, we got some > new images but these images are not been parsed by tesseract > > I removed extra noise, changed to greyscale,

Re: [tesseract-ocr] wron Characters in LibreOffice Writer with German spezial Characters

2018-06-29 Thread Zdenko Podobny
this is not tesseract problem: https://ask.libreoffice.org/en/question/97993/why-doesnt-lo-writer-open-and-save-text-documents-encoded-in-utf-8-without-bom-any-plans-to-fix-this-soon/ tesseract output is UTF-8 encoded. Zdenko pi 29. 6. 2018 o 19:37 Martin Jenniges napísal(a): > Hello, > >

Re: [tesseract-ocr] Tesseract is generating error prompt ''Not enough data at scanline" while extracting a tiff file

2018-06-26 Thread Zdenko Podobny
can you post a image? It seems like leptonica/tiff problem Zdenko ut 26. 6. 2018 o 7:21 James Worldprogram napísal(a): > *Problem*: - I am using Tesseract: *tesseract-ocr-setup-3.05.01.exe* as a > command line argument in Windows OS program with the argument *-l eng*; > it is working

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-05 Thread Zdenko Podobny
Yes, it is ok, but you do not have to create separate issue for PR (PR is a issue too) Zdenko ut 5. 6. 2018 o 16:52 Paul Kitchen napísal(a): > ZDenko, > > I'm new to this so hopefully I did everything correctly. Here is the issue > I created: > >

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-05 Thread Zdenko Podobny
You need to fork official repository and then you have all permission you need. When you make your changes you can send pull request to official repository with your changes. Zdenko ut 5. 6. 2018 o 15:06 Paul Kitchen napísal(a): > ZDenko, > > Unfortunately I don't seem to have write

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-05 Thread Zdenko Podobny
Please make PR for master (4.0) branch and I will cherry-pick for 3.05... Zdenko ut 5. 6. 2018 o 4:38 Paul Kitchen napísal(a): > ZDenko, > > I checked out the latest tesseract code and updated to branch 3.05. I see > that the int64_t area bug is already fixed (thanks!). I also see that the >

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-04 Thread Zdenko Podobny
Stefan, Paul suggest to modified also LoadDataFromFile (ccutil/genericvector.h). That modification is not needed? Zdenko po 4. 6. 2018 o 17:32 'Stefan Weil' via tesseract-ocr < tesseract-ocr@googlegroups.com> napísal(a): > As far as I see 4.0.0 is good. I have sent a pull request which

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-04 Thread Zdenko Podobny
, _data[0], boxes, texts > , > box_texts, pages); > } > > > > On Saturday, June 2, 2018 at 2:22:16 AM UTC-6, zdenop wrote: >> >> Please check if this is ok now. If yes, I am willing to make 3.05.02 >> release ;-) >> >> Zden

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-02 Thread Zdenko Podobny
Please check if this is ok now. If yes, I am willing to make 3.05.02 release ;-) Zdenko so 2. 6. 2018 o 10:16 Zdenko Podobny napísal(a): > done in > https://github.com/tesseract-ocr/tesseract/commit/bc5dfc4b953babcc865f68a55c3bf415f4280b1a > Zdenko > > > št 31. 5. 2018 o 22

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-02 Thread Zdenko Podobny
done in https://github.com/tesseract-ocr/tesseract/commit/bc5dfc4b953babcc865f68a55c3bf415f4280b1a Zdenko št 31. 5. 2018 o 22:39 shree napísal(a): > This has been an issue for long. Thanks for finding the problem. > > Please submit a PR on github. > > On Friday, June 1, 2018 at 1:55:25 AM

Re: [tesseract-ocr] Where to find tessdata folder?

2018-05-31 Thread Zdenko Podobny
Did you follow instruction for installation of that package? Did you try internet search before posting on forum? Did you try to search for help in project tesserocr??? I just put it to google and I got: https://pypi.org/project/tesserocr/ https://github.com/sirfz/tesserocr

Re: [tesseract-ocr] a way to extract the location of each components in image

2018-05-20 Thread Zdenko Podobny
Did you read wiki before posting? E.g. https://github.com/tesseract-ocr/tesseract/wiki/APIExample#getcomponentimages-example Zdenko ne 20. 5. 2018 o 8:00 nick napísal(a): > hi > > is there a way to extract the location of each components (lines) in the > image ? > > for

[tesseract-ocr] Announcement: Tesseract tessdata downloader from GitHub repositories 1.0

2018-05-11 Thread Zdenko Podobny
Hello all, if you are interesting in downloading only some language of traineddata from repositories (or different tagged version) have a look at tessdata_downloader[1] . I just released version 1.0 [2] . I created this script in python, but also I was able to create windows 64bit "frozen" app

Re: [tesseract-ocr] Tesseract couldn't load any languages!

2018-05-04 Thread Zdenko Podobny
The error message is clear. Or? Zdenko pi 4. 5. 2018 o 20:38 Dattatraya Tembare napísal(a): > Exception in thread "main" java.lang.Error: Invalid memory access > at com.sun.jna.Native.invokePointer(Native Method) > at

Re: [tesseract-ocr] How to convert hocr to MS word .docx file

2018-05-03 Thread Zdenko Podobny
MS word ;-) 1. rename test.hoct to test.hocr.html 2. open test.hocr.html in real text editor (e.g. notepad++) and delete lines 2 and 3 otherwise word will produce error message 3. open test.hocr.html in word. Zdenko št 3. 5. 2018 o 1:42 abdu napísal(a): >

Re: [tesseract-ocr] tesseract 4 beta: openCL useage

2018-04-27 Thread Zdenko Podobny
If you have experience your help will be warmly welcomed. OpenCL is not maintained and it is on good way to be removed if maintainer/contributor will not be found. Anyway it is not used extensively, so there is a place for improvement, Zdenko pi 27. 4. 2018 o 10:21 Janpieter Sollie

Re: [tesseract-ocr] just installed, get error messages

2018-04-25 Thread Zdenko Podobny
Why are you building project from source if you have no clue what you do? Based on your other post: you decided to build leptonica without support of common image formats. Dňa št 26. 4. 2018, 7:01 Rolf Schumacher napísal(a): > I just installed from git repository > >

Re: [tesseract-ocr] error: required directory

2018-04-25 Thread Zdenko Podobny
We are making reorganization of tesseract. Using the latest code is not recommended at all especially if you do not follow developers communications. Zdenko 2018-04-25 19:59 GMT+02:00 Marius Amado-Alves : > Trying to install on a Mac, cannot pass the autogen.sh step.

Re: [tesseract-ocr] Trained font - always one letter wrong

2018-04-25 Thread Zdenko Podobny
Well, you should contact creator of traineddata . We have no clue what they did.. Zdenko 2018-04-25 14:55 GMT+02:00 : > Hello there, > > i don't know what to do anymore... > I want to use tesseract-ocr 3.05 for scanning documents, using the font > "Perfect DOS VGA 437

Re: [tesseract-ocr] Unsure why tesseract isn't returning the correct text

2018-04-21 Thread Zdenko Podobny
Time for upgrade? Zdenko 2018-04-21 22:14 GMT+02:00 'DR' via tesseract-ocr < tesseract-ocr@googlegroups.com>: > I'm using: > > tesseract 3.04.01 > leptonica-1.73 > libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : > libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2

Re: [tesseract-ocr] install tesseract-4.00.00alpha error

2018-04-18 Thread Zdenko Podobny
You can start with using the latest version and providing details... Zdenko 2018-04-18 7:56 GMT+02:00 Kai Feng : > ./.libs/libtesseract.so: undefined reference to `omp_get_thread_num' > ./.libs/libtesseract.so: undefined reference to `GOMP_sections_end_nowait' >

Re: [tesseract-ocr] How to include tesseract 4.00 to my visual studio c++ ??

2018-04-12 Thread Zdenko Podobny
You should download the source and build and install it with cppan + cmake. See https://github.com/tesseract-ocr/tesseract/wiki/Compiling#develop-tesseract Zdenko 2018-04-11 4:21 GMT+02:00 : > i have been using tesseract 3.04 i could use it just by adding the include

Re: [tesseract-ocr] Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Zdenko Podobny
If you followed someone tutorial you should complain to its author ;-). I am not familiar with Mac, but on linux you can do it (in command) this way: export TESSDATA_PREFIX=/usr/loca/share/ Maybe it is similar on Mac. Try to google how to set environment variable on Mac. Zdenko 2018-04-10

Re: [tesseract-ocr] Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Zdenko Podobny
First of all: your command if wrong. It should be constructed this way: tesseract image output [options] See tesseract --help for more details. Next: error message is clear: Error opening data file ./tessdata/Fraktur.traineddata You (or your installation) instructed to look for trainneddata

Re: [tesseract-ocr] [4.0.0-beta.1] read_params_file: parameter not found: PNG

2018-04-02 Thread Zdenko Podobny
aim is to have tool that is easy portable with minimum dependencies. IMO it is standard on linux/unix like system to use --help option for explanation of usage. Zdenko 2018-04-02 14:38 GMT+02:00 JP T : > Well, the problem is error handling. > If tesseract would have

Re: [tesseract-ocr] [4.0.0-beta.1] read_params_file: parameter not found: PNG

2018-04-02 Thread Zdenko Podobny
... and it was exactly the same in tesseract 3.0x as in 4.0 Zdenko 2018-04-02 0:14 GMT+02:00 JP T : > Solved: > must be* tesseract infile outfile options* instead of standard unix *program > options infile outfile*. > On Sun 1 Apr, 2018, 7:25 PM JP T,

Re: [tesseract-ocr] [4.0.0-beta.1] read_params_file: parameter not found: PNG

2018-04-01 Thread Zdenko Podobny
If you are really insterested in help than provide full information/command how you run tesseract. Zdenko 2018-03-31 20:19 GMT+02:00 JP T : > Hi > > I just updated from version 3.04.01 but now tesseract fails with above > message if I give the -psm option. > input files

Re: [tesseract-ocr] Tesseract output format: doc or docx

2018-03-22 Thread Zdenko Podobny
tesseract can produce output in txt, pdf and hocr (html). Tesseract focus is to provide ocr engine and not complex document output like docx or ods. Zdenko 2018-03-22 7:47 GMT+01:00 : > Can I use tesseract in Ubuntu to get .docx or .doc output(word format). > >

Re: [tesseract-ocr] Compilation Error Tesseract 4.0 - macOS High Sierra

2018-03-17 Thread Zdenko Podobny
Why you specify compiler (especially if it can not be found)? Zdenko 2018-03-17 19:07 GMT+01:00 Richard McAlexander : > Thanks. Anyone know how I can fix that? I have gcc/Xcode installed, not > sure why its not finding the command. > > On Saturday, March 17, 2018

Re: [tesseract-ocr] Compilation Error Tesseract 4.0 - macOS High Sierra

2018-03-17 Thread Zdenko Podobny
you specified that c++ compiler is: g++-6 and your system reports: g++-6: command not found Zdenko 2018-03-17 11:57 GMT+01:00 Richard McAlexander : > I'm having trouble compiling Tesseract 4.0. I have all dependencies > installed. The error occurs when after I

Re: [tesseract-ocr] What is differnt tesseracr 4.00(alpha) from tesseract4.00(Beta) in details ?

2018-03-16 Thread Zdenko Podobny
here are details: https://github.com/tesseract-ocr/tesseract/commits/master Zdenko 2018-03-16 12:37 GMT+01:00 이경준 : > Hi ~ > > What is differnt tesseracr 4.00(alpha) from tesseract4.00(Beta) in details > ? > > Thank You > > -- > You received this message because you are

Re: [tesseract-ocr] Re: tesseract 4.00 beta is released ? I saw the who use the tesseract 4.00 beta

2018-03-12 Thread Zdenko Podobny
it is official: https://github.com/tesseract-ocr/tesseract/releases Zdenko 2018-03-12 10:09 GMT+01:00 adarsh shukla : > There is no official release of tesseract 4.0 Beta. There might be some > unofficial release, not found anything as such in Google. > > On Monday,

Re: [tesseract-ocr] Tesseract convert image to gibberish

2018-02-25 Thread Zdenko Podobny
https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality Zdenko 2018-02-25 11:38 GMT+01:00 Dusayanta Prasad : > I am try to convert the below image using Tesseract in linux using the > following command: > > tesseract img.jpg out -l eng > > >

Re: [tesseract-ocr] Makefile in master branch of tesseract-ocr/tesseract

2018-02-23 Thread Zdenko Podobny
https://github.com/tesseract-ocr/tesseract/blob/master/INSTALL.GIT.md Zdenko 2018-02-23 21:53 GMT+01:00 : > I don't see Makefile in the master branch of tesseract-ocr/tesseract, Is > there a way for me to get it from other branches? I needed to install >

Re: [tesseract-ocr] New to command line and tesseract. Errors for PDF

2016-03-24 Thread zdenko podobny
pdf (mm.pdf) is not image file. It is document file. Tesseract accept as input only image files (tiff, png, jpeg... based on your leptonica build) Zdenko On Thu, Mar 24, 2016 at 5:27 PM, Jacob Stoker wrote: > Hello tesseract-ocr world! > > So I'm running tesseract. I have

Re: [tesseract-ocr] How does the Tesseract variable “save_blob_choices” works (in tess-two)?

2016-03-14 Thread zdenko podobny
I am not familiar with tess-two, but I see there function getChoicesAndConfidence[1]. [1] https://github.com/rmtheis/tess-two/blob/master/tess-two/src/com/googlecode/tesseract/android/ResultIterator.java#L80 Zdenko On Mon, Mar 14, 2016 at 7:12 PM, Sergio Mendoza wrote:

Re: [tesseract-ocr] How does the Tesseract variable “save_blob_choices” works (in tess-two)?

2016-03-14 Thread zdenko podobny
have a look at tesseract::ChoiceIterator. See https://github.com/tesseract-ocr/tesseract/wiki/APIExample#example-of-iterator-over-the-classifier-choices-for-a-single-symbol Zdenko On Mon, Mar 14, 2016 at 6:17 AM, Sergio Mendoza wrote: > So I've been trying to use

Re: [tesseract-ocr] Re: Page Breaks

2016-03-12 Thread zdenko podobny
you have very old version of tesseract. page_separator was implemented after 3.02 release Zdenko On Sat, Mar 12, 2016 at 10:22 PM, wrote: > Thanks Zdenko. I'm still stuck. I OCR'd an 81 page tiff file and I've > searched my output txt file for the form feed character (asc

Re: [tesseract-ocr] Page Breaks

2016-03-12 Thread zdenko podobny
Default page separator is the form feed control character. You can modify it with parameter page_separator. Zdenko On Sat, Mar 12, 2016 at 7:21 PM, wrote: > If I OCR a multipage tiff file using Tesseract it comes out as one single > page .txt file. Is there a way to

Re: [tesseract-ocr] Dropped characters from perfect image

2016-03-09 Thread zdenko podobny
What page segmentation method[1] you used? [1] https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#page-segmentation-method Zdenko On Wed, Mar 9, 2016 at 5:14 PM, 'John Taves' via tesseract-ocr < tesseract-ocr@googlegroups.com> wrote: > I am trying to recognize a flawless image. I

Re: [tesseract-ocr] Re: Page layout analysis module

2016-03-08 Thread zdenko podobny
IMO it is - in hocr (xml) output or tsv (in master branch a.k.a 3.05) Zdenko On Tue, Mar 8, 2016 at 3:14 PM, Age Bosma wrote: > Hi Teng, > > The options I mention aren't available in tesseract. I listed them as > suggestions for extending tesseract. They haven't been

Re: [tesseract-ocr] limiting tesseract to one language

2016-03-06 Thread zdenko podobny
Can you please make issue in tessdata part of project[1] and provide (simple) test image? Thanks, [1] https://github.com/tesseract-ocr/tessdata/issues Zdenko On Sun, Mar 6, 2016 at 1:00 PM, Bojan Djuric wrote: > In language file spr_latn.tessdata (Serbian lating) there is

Re: [tesseract-ocr] Supress Tesseract Open Source OCR Engine v3.03 with Leptonica

2016-02-24 Thread zdenko podobny
This is strange (not standard output). Can you provide more details? Zdenko On Wed, Feb 24, 2016 at 10:21 PM, Monika Arora wrote: > Hello, > > My error logs are getting filled up with the following error messages, > > Tesseract Open Source OCR Engine v3.03 with

Re: [tesseract-ocr] Re: tesseract api and user-words

2016-01-23 Thread zdenko podobny
try to use "C:\\tesseract-ocr-3.02\\" in init... Zdenko On Sat, Jan 23, 2016 at 6:54 PM, Rajil Yadav wrote: > Hi, > > I am trying to do winth win8, s 2012, no environment variable is set. > > > api = new tesseract::TessBaseAPI(); > int i = >

Re: [tesseract-ocr] Re: All-caps, small-caps

2015-12-28 Thread zdenko podobny
When you ask for support please provide example files. Did you try the latest version of tesseract? Zdenko On Sun, Dec 27, 2015 at 9:43 PM, bácsi Kazi wrote: > Could you help? Have I missed something blatantly trivial? > Any help would be highly appreciated! > > Kazi > >

Re: [tesseract-ocr] Re: All-caps, small-caps

2015-12-28 Thread zdenko podobny
First of all - there is no such policy as not providing Windows installers. There is no installer because there is nobody who would maintain it and provide solution (e.g. NSIS destroys my PATH variable on windows ;-) ). Everybody is busy with programming :-) (something else). Next: there is

Re: [tesseract-ocr] v3.04 Release???

2015-11-17 Thread zdenko podobny
https://github.com/tesseract-ocr/tesseract/releases Zdenko On Tue, Nov 17, 2015 at 8:02 PM, Rich Taylor wrote: > I see that the code currently says "3.05". > > So... was there an official 3.04 release? Is there a snapshot of the code > from that point? If not, what is

Re: [tesseract-ocr] undefined reference to `tesseract::TessBaseAPI::TessBaseAPI()

2015-11-02 Thread zdenko podobny
If you want to use tesseract you need to have tesseract library. And tesseract need leptonica[1] Error message is clear - compiler can not find these libraries - you need to adjust path in project file according location of that libraries. [1]

Re: [tesseract-ocr] undefined reference to `tesseract::TessBaseAPI::TessBaseAPI()

2015-11-01 Thread zdenko podobny
have a look at qt-box-editor[1] (even I did not test it with for a long time, but there should be support for QT5) [1] https://github.com/zdenop/qt-box-editor Zdenko On Mon, Nov 2, 2015 at 5:44 AM, Liu Paulson wrote: > I download the tesseract-ocr library from the >

Re: [tesseract-ocr] Re: Tesseract slow performance

2015-10-26 Thread zdenko podobny
How you tried to build it? Wich revision? What kind of error you got? What is your OS? Zdenko On Mon, Oct 26, 2015 at 5:35 AM, supriya Das wrote: > Hello kaushal >Thanks for getting response from you. I have tried to build to latest > version of tesseract code from

Re: [tesseract-ocr] Tesseract FONT for OCRA Standard

2015-10-26 Thread zdenko podobny
Can you place it somewhere (e.g. for further improving) on internet so we can link it for wiki? Zdenko On Sun, Oct 25, 2015 at 9:49 PM, Pierre-Luc Pineault < pierrelucpinea...@gmail.com> wrote: > Hello > > I just created the OCR A STD font for Tesseract. I thought this might be a > good idea to

Re: [tesseract-ocr] Tesseract FONT for OCRA Standard

2015-10-26 Thread zdenko podobny
I added it to addons wiki[1]. [1] https://github.com/tesseract-ocr/tesseract/wiki/AddOns#community-training-projects Zdenko On Mon, Oct 26, 2015 at 3:32 PM, Pierre-Luc Pineault < pierrelucpinea...@gmail.com> wrote: > Thank you! > > I've create the repository here : >

Re: [tesseract-ocr] Re: obtaining pre-processed image

2015-10-20 Thread zdenko podobny
Why there must be default config file??? Default values are defult because there are alredy set. config files just modified them. Zdenko On Tue, Oct 20, 2015 at 11:24 AM, Mayu Shukla wrote: > hello Tom, > > I got your point. > My question was more on general terms,i mean

[tesseract-ocr] Fwd: tesseract-ocr - Google Groups: Message Pending [{INLZp6-bu9eaHioCaWcwAW2_BwmznK2y0}]

2015-09-22 Thread zdenko podobny
-- Forwarded message -- From: Juan Pablo Aveggio To: tesseract-ocr Cc: Date: Mon, 21 Sep 2015 16:17:45 -0700 (PDT) Subject: Train tesseract 3.04 for recognition of six patterns no existents in UTF-8 Hello I'm trying to train

Re: [tesseract-ocr] Re: Tesseract 3.04 error.

2015-09-17 Thread zdenko podobny
First of all - if you need help, provide original image for investigation. Next I do not understand why you try compile old code (3.01) from SVN. It does not make sense - we switched to git and github.com, there were a lot of bugfixes related to different platform including Mac. If you want to

Re: [tesseract-ocr] Re: Error looking up function 'TessTextRendererCreate' on Mac

2015-09-04 Thread zdenko podobny
Or other way around: TessTextRendererCreate is tesseract 3.04 (C-API) function, so you need to upgrade your tesseract library. Zdenko You'd need Tess4J 1.5, which is compatible with Tess 3.02. On Friday, September 4, 2015 at 5:59:00 AM UTC-5, Hang Wen wrote: > > Hi, > > I got the following error

Re: [tesseract-ocr] Traineddata inspector

2015-09-04 Thread zdenko podobny
On Thu, Sep 3, 2015 at 9:41 AM, Jozef M. wrote: > Dear all, > > you can use the following web app to inspect some of the internals of > traineddata files: > https://te-traineddata-ui.herokuapp.com > > Few notes: > - this version does not parse cube specifics and some of the

Re: [tesseract-ocr] Re: Tesseract gives the same results in cube mode, is this normal/common?

2015-09-04 Thread zdenko podobny
tessedit_ocr_engine_mode is init-only[1] parameter (INT_INIT_MEMBER [2]) e.g. you can set it only during initialization of tesseract. Otherwise it has no effect. [1] https://github.com/tesseract-ocr/tesseract/wiki/ControlParams#init-only [2]

Re: [tesseract-ocr] Successfully installed and run Tesseract on Ubuntu, but can't find baseapi.h file to include ...

2015-09-03 Thread zdenko podobny
On Wed, Sep 2, 2015 at 5:16 PM, wrote: > Hi Thanks for the notes! The problem was that I was used to using > tesseract on mac, which has a different install process (I believe) from > ubuntu. I needed to compile it on ubuntu, which I think is possibly > equivalent to

Re: [tesseract-ocr] Successfully installed and run Tesseract on Ubuntu, but can't find baseapi.h file to include ...

2015-08-31 Thread zdenko podobny
Did you installed tesseract dev package? Zdenko On Sun, Aug 30, 2015 at 8:07 PM, wrote: > > Tesseract: > tesseract-ocr: Installed: 3.03.02-3 > > > Ubuntu: > > Ubuntu 14.04.3 LTS > > Also, just to make sure I'm not missing something, is there a distinction > between

Re: [tesseract-ocr] Re: persian in tesseract-ocr

2015-08-17 Thread zdenko podobny
On Mon, Aug 17, 2015 at 6:07 AM, ShreeDevi Kumar shreesh...@gmail.com wrote: Ray was looking for comparative feedback regarding the new traineddata for RTL languages, so this will be useful. As far as I know, Google Docs does not use tesseract OCR engine for recognizing the text.

Re: [tesseract-ocr] Where to find Tesseract 3.04 library for windows

2015-08-13 Thread zdenko podobny
Zdenko On Thu, Aug 13, 2015 at 4:59 PM, Anshul Maheshwari anshul.ffm...@gmail.com wrote: Hello, Are there any pre-built library for windows of 3.04 tesseract. There are only cygwin based[1]. Visual Studio 2012 x86 and x64 can be found at C# tesseract project[2], but it looks like there are 3

Re: [tesseract-ocr] Re: Bitmap subtitles are not detected properly

2015-08-12 Thread zdenko podobny
Quick reply ;-): have a look at TessBaseAPIGetComponentImages. There is python example[1] for C-API, so you could be able to follow if you are familiar with tesseract C-API. Just change tesseract.TessBaseAPISetPageSegMode(api, PSM_AUTO_OSD) to tesseract.TessBaseAPISetPageSegMode(api,

Re: [tesseract-ocr] Re: Using CAPI to get char* from ocr

2015-08-12 Thread zdenko podobny
you wrote: but still output is showing not all lines in input image and I have attached 3 files which are not detected properly but you sent one png file with word We've... How many lines do you expect in it ;-) ? Zdenko On Wed, Aug 12, 2015 at 8:56 AM, Anshul Maheshwari

Re: [tesseract-ocr] displayed version number of tesseract when compiled from git

2015-07-23 Thread zdenko podobny
@ http://bhajans.ramparivar.com On Fri, Jul 24, 2015 at 12:41 AM, zdenko podobny zde...@gmail.com wrote: 1. Well if someone compile code from git (s)he should know what revision is using ;-) And of course git code (unreleased) should not be distributed. 2. Current git code (should

Re: [tesseract-ocr] displayed version number of tesseract when compiled from git

2015-07-23 Thread zdenko podobny
my windows builds in debug mode then? greetings, simon Am 23.07.2015 um 21:11 schrieb zdenko podobny: 1. Well if someone compile code from git (s)he should know what revision is using ;-) And of course git code (unreleased) should not be distributed. 2. Current git code

Re: [tesseract-ocr] displayed version number of tesseract when compiled from git

2015-07-23 Thread zdenko podobny
1. Well if someone compile code from git (s)he should know what revision is using ;-) And of course git code (unreleased) should not be distributed. 2. Current git code (should) shows version number 3.05.00dev or 3.04.01dev based on (main) branches. And once again compiled program

Re: [tesseract-ocr] building tesseract on windows using cygwin

2015-07-21 Thread zdenko podobny
it can use at the moment. Let's see if it works. had no time currently to test but will do in the office tomorrow. greetings, simon Am 20.07.2015 um 20:38 schrieb zdenko podobny: should be fixed - pull updates from git repo... Zdenko On Mon, Jul 20, 2015 at 6:13 PM, Simon

Re: [tesseract-ocr] building tesseract on windows using cygwin

2015-07-21 Thread zdenko podobny
from my phone. excuse the brevity On 21 Jul 2015 11:14, zdenko podobny zde...@gmail.com wrote: I do not understand - this is standard fix/commit as any other. Why it should be tagged??? Zdenko On Tue, Jul 21, 2015 at 6:44 AM, ShreeDevi Kumar shreesh...@gmail.com wrote: Zdenko, How

Re: [tesseract-ocr] building tesseract on windows using cygwin

2015-07-20 Thread zdenko podobny
the brevity. On 21 Jul 2015 00:09, zdenko podobny zde...@gmail.com wrote: should be fixed - pull updates from git repo... Zdenko On Mon, Jul 20, 2015 at 6:13 PM, Simon Eigeldinger simon.eigeldin...@vol.at wrote: Hi all, just tried to compile tesseract on windows using cygwin from

Re: [tesseract-ocr] Re: Tesseract 3.04 Build Error

2015-07-19 Thread zdenko podobny
? Or migrate to linux box On Sat, Jul 18, 2015, 9:32 PM zdenko podobny zde...@gmail.com wrote: Last official leptonica windows release is 1.68. So if you build leptonica by yourself, you should change modify tesseract solution based on your leptonica compilation... Zdenko On Sat, Jul 18

Re: [tesseract-ocr] Re: Tesseract 3.04 Build Error

2015-07-18 Thread zdenko podobny
Last official leptonica windows release is 1.68. So if you build leptonica by yourself, you should change modify tesseract solution based on your leptonica compilation... Zdenko On Sat, Jul 18, 2015 at 8:43 AM, Kasi Selvam kasiselv...@gmail.com wrote: On Monday, 29 June 2015 13:16:53

Re: [tesseract-ocr] compile error win8 / vc++ 2008

2015-06-02 Thread zdenko podobny
Can you please clarify what it is SVN 3.02 version??? Zdenko On Tue, Jun 2, 2015 at 5:50 PM, Jeremias Schucker jeremias.schuc...@gmail.com wrote: Hello everyone, I was trying to compile SVN 3.02 version with visual C++ 2008 on Win8.1 and got the following error: warning C4251:

Re: [tesseract-ocr] Re: Some questions about tesseract 3.0x.

2015-05-13 Thread zdenko podobny
Ad 1. This file was generated by Google in their internal system. The tools are opensourced (see 3Training.pdf[1] - but I would suggest you to read all presentations) now (or ported, so they use free libraries instead of google internal libraries). Regarding used fonts I guess that file

Re: [tesseract-ocr] Tesseract With Opencl

2015-05-08 Thread zdenko podobny
OpenCL support is experimental or it fails for some images (see issues marked with OpenCL). I asked team that prepared this code for review and fix, but I can not say when it will be finished. Zdenko On Fri, May 8, 2015 at 10:08 AM, Mohammad Umar mohammaduma...@gmail.com wrote: Hi, Any

Re: [tesseract-ocr] run-time error about missing functions, etc.

2015-04-22 Thread zdenko podobny
This is leptonica issue ;-) Leptonica use stub/dummy file for not presented support[1]. So message Error in pixReadStreamTiff: function not present you should read as:you did not compiled leptonica with (lib)tiff support you want to use it for reading tiff file... [1]

Re: [tesseract-ocr] Re: Multiple tifs to one file

2015-04-22 Thread zdenko podobny
IMO there are 2 easy solutions: 1. You can combine input images with ImageMagick to multipage tif (e.g. convert image1.png image2.bmp image3.tif output.tif) 2. You can create text file with image filename per line. e.g. filelist.lst that has this context: song.png tessinput.tif superscript.png

Re: [tesseract-ocr] New Georgian (kartuli ena) traineddata for Tesseract

2015-04-04 Thread zdenko podobny
, zdenko podobny zde...@gmail.com wrote: Can you create a repository for your training (in sourceforge or github)? Maybe with detailed description how you created it (so potentially other people can try to improve/extend it). Zdenko Zdenko On Fri, Apr 3, 2015 at 5:04 AM, Derek Dohler doh

Re: [tesseract-ocr] Re: Android OCR application looking for quality improvments

2015-03-09 Thread zdenko podobny
Have a look at Text Fairy (OCR)[1]. I have a good experience with it (I use it for extracting text from books for quotation e.g. just few lines). Code is availabe on Github[2] (I am not sure if it is up-to-date). [1] https://play.google.com/store/apps/details?id=com.renard.ocr [2]

Re: [tesseract-ocr] Problems with make script with of head version on a Synology system.

2015-02-17 Thread zdenko podobny
What version of gcc is there? Maybe have a look at this solution on stackoverflow[1] [1] http://stackoverflow.com/questions/8640689/gcc-4-1-2-error-integer-constant-is-too-large-for-long-type Zdenko On Tue, Feb 17, 2015 at 7:05 PM, Markijan Blaschtschak mblas...@gmail.com wrote: Hi all, I

Re: [tesseract-ocr] Re: Using tesseract with VS2012. HELP PLEASE

2015-01-28 Thread zdenko podobny
On Wed, Jan 28, 2015 at 5:26 PM, juan peralta peralta11...@gmail.com wrote: El sábado, 24 de noviembre de 2012, 19:30:25 (UTC-6), Minjie Zheng escribió: Okay, I have spent literally two days trying to get tesseract to work in VS2012. No matter what I do, my programs would not execute. It

Re: [tesseract-ocr] Language

2014-10-30 Thread zdenko podobny
From my experience dictionary has only limited effect on OCR result: e.g. adding word to dictionary does not mean that tesseract will recognize it. But on other side missing word in dictionary does not mean that tesseract will not recognize it correctly. So if you have just ascii text (without

Re: [tesseract-ocr] Passing glyph vector data directly to tesseract

2014-10-30 Thread zdenko podobny
On Fri, Oct 24, 2014 at 1:45 AM, Ryan Dev software.developer.r...@gmail.com wrote: Hi, I have what I think is a unique situation, and I was hoping I could get some hints on how to proceed. I have problem font files, for which I want to fix the unicode mappings for. I also have PDF files

Re: [tesseract-ocr] Many 'question mark' chars in recognized text

2014-10-17 Thread zdenko podobny
OCR a test image with you app, store result to text file. Than OCR the same image with tesseract executable (output should be in text file by default) and compare results. If output from tesseract executable is OK, but from your app is wrong (e.g. there are only ascii letters) = you have problem

Re: [tesseract-ocr] PDF output not searchable within SumatraPDF

2014-10-15 Thread zdenko podobny
can you post somewhere 18.jpg? Zdenko On Wed, Oct 15, 2014 at 3:46 AM, Chris Cameron ch...@upnix.com wrote: This command: $ tesseract.exe 18.jpg test Gives me test.txt, which has all the text from 18.jpg, as expected. This command: $ tesseract.exe 18.jpg test pdf Gives me test.pdf,

Re: [tesseract-ocr] produce delimited output using hOCR or by preserving original document spacing

2014-10-14 Thread zdenko podobny
Just a hint: there is a fork that tries to output HOCR details in a TSV format file https://code.google.com/r/email-hocr-tsv/[1]. I did not test it :-), so I have not clue if it fits to the original request... [1] https://code.google.com/r/email-hocr-tsv/source/list Zdenko On Tue, Oct 14, 2014

Re: [tesseract-ocr] Tesseract from git and pdf output

2014-10-02 Thread zdenko podobny
post somewhere your input and output files Zdenko On Thu, Oct 2, 2014 at 2:03 PM, simon.eigeldin...@vol.at wrote: hi all, i compiled tesseract from git yesterday and played with it a little bit. pretty impressive what happened since around 2 years. not only has tesseract a lower filesize

Re: [tesseract-ocr] Using setImage(byte[],width,height,bpp,bpl) instead of setImage(bitmap) doesn't recognise text.

2014-10-01 Thread zdenko podobny
After SetImage call (tesseract api) function GetThresholdedImage and save result (leptonica PIX) to disk. If the output is wrong you did not set image data to tesseract correctly (e.g. you need to change SetImage parameters). Zdenko On Wed, Oct 1, 2014 at 7:30 AM, Umresh umr...@myingage.com

Re: [tesseract-ocr] How can i integrate tesseract OCR with opencv in windows8?

2014-09-26 Thread zdenko podobny
On Fri, Sep 26, 2014 at 2:31 PM, Surajit das smartsurajit2...@gmail.com wrote: I want to learn how to integrate tesseract with opencv in windows. I am really really confused about how to use tesseract and i have read a lot of articles and link but i couldnt understand. So, please can anyone

Re: [tesseract-ocr] --listlangs shows no installed languages

2014-09-26 Thread zdenko podobny
On 26 Sep 2014 20:22, Nicolas Nickisch n.nickisc...@gmail.com wrote: I compiled and installed tesseract 3.03 manually. tesseract --listlangs gives me an error message: error opening data file /usr/local/share/tessdata/eng.traineddata INdeed, I don't have eng.traineddata in that directory,

Re: [tesseract-ocr] Modification of background image allowed in PDF output?

2014-09-19 Thread zdenko podobny
This is known issue - try current code from git repository. It should be fixed. Zdenko On Fri, Sep 19, 2014 at 2:38 PM, Frank Siegert frank.sieg...@googlemail.com wrote: Dear all, I have been testing tesseract to embed OCR in scanned PDF documents, and it works phenomenally well in

Re: [tesseract-ocr] Modification of background image allowed in PDF output?

2014-09-19 Thread zdenko podobny
Well yes and no ;-) Yes - there should be no change on image, but no - you need to expect that (re)compression of input image by pdf renderer could take a place. See comments for issue 1285[1] for more details. [1] https://code.google.com/p/tesseract-ocr/issues/detail?id=1285 Zdenko On Fri, Sep

Re: [tesseract-ocr] version 3.04

2014-09-19 Thread zdenko podobny
There is no tesseract 3.04 - so you can not install it. Your question indicates that you do not understand consequences of your action, so I strongly suggest you to revert to last stable release which is 3.02.02. Zdenko On Fri, Sep 19, 2014 at 8:31 PM, Rick Leir rich...@c7a.ca wrote: Ubuntu

Re: [tesseract-ocr] Re: [Clarification request] Is it possible to let Tesseract generate three output files i) text ii) hOCR iii) PDF in a *single* run ?

2014-09-17 Thread zdenko podobny
At the moment tesseract executable allows only one output (per run). It is a trivial change to allow multiple outputs Zdenko On Wed, Sep 17, 2014 at 4:08 AM, Shree Devi Kumar shreesh...@gmail.com wrote: Quan, Can it also be done in commandline version? Shree Shree Devi Kumar

Re: [tesseract-ocr] Re: read_params_file

2014-09-10 Thread zdenko podobny
What kind of solution you expect for wrong command? Zdenko On Wed, Sep 10, 2014 at 4:42 PM, Dovhani Foneworx dfone...@gmail.com wrote: Any solution for this? fonew...@foneworxtest.foneworx.co.za:~/DM/DEVPLACE/BOXEDIT$ tesseract email.tif num10.tif num11.tif num12.tif num13.tif num13.tif

Re: [tesseract-ocr] Adding New Language Training to Downloads Page

2014-09-07 Thread zdenko podobny
Fraktur repository is here: https://github.com/paalberti/tesseract-dan-fraktur.git IMO you should post your files there. Zdenko On Thu, Aug 28, 2014 at 3:38 PM, matthew christy matt.chri...@gmail.com wrote: Ah, OK. Thanks Zdenko. Is there any organize alternative for posting Tesseract

Re: [tesseract-ocr] Does tesseract 3.03 return 3.02 with -version ?

2014-09-07 Thread zdenko podobny
1. Last version of stable tesseract release is 3.02 2. There is no official 3.03 release. There is only 3.03 release candidate that is intended for developer for testing. Even 3.03 is for developer its building on linux should be quite smooth process (maybe it depend on experiences with building

[tesseract-ocr] Re: [tesseract-dev] Re: tesseract 3.04 can be downloaded as a package for msys2 (will work on windows)

2014-08-27 Thread zdenko podobny
Anybody who is packaging tesseract and publicaly sharing 3.03 (excluding -rc1) and 3.04 is lying. There are no such releases. Repository is intended for developers and testers not for packagers! And it is absolutely normal that there are changes of version withing repository. There are for

[tesseract-ocr] Re: [tesseract-dev] Re: tesseract 3.04 can be downloaded as a package for msys2 (will work on windows)

2014-08-27 Thread zdenko podobny
to get tesseract 3.03 rc1 ? On Wed, Aug 27, 2014 at 2:40 PM, zdenko podobny zde...@gmail.com wrote: Anybody who is packaging tesseract and publicaly sharing 3.03 (excluding -rc1) and 3.04 is lying. There are no such releases. Repository is intended for developers and testers not for packagers

Re: [tesseract-ocr] Adding New Language Training to Downloads Page

2014-08-27 Thread zdenko podobny
See: http://google-opensource.blogspot.sk/2013/05/a-change-to-google-code-download-service.html Zdenko On Wed, Aug 27, 2014 at 6:22 PM, matthew christy matt.chri...@gmail.com wrote: Hi all, I've created some new English-language Fraktur training by utilizing the current and excellent

<    3   4   5   6   7   8   9   10   11   12   >