Re: concatenating tr files

2013-04-18 Thread zdenko podobny
post somewhere your files, so we can test it on linux... Zdenko On Thu, Apr 18, 2013 at 6:15 AM, Shree Devi Kumar shreesh...@gmail.comwrote: http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 says: An alternative to multi-page tiffs is to create many single-page tiffs for a

Re: Hindi training data - unicharset_extractor error

2013-04-18 Thread zdenko podobny
On Thu, Apr 18, 2013 at 5:35 AM, sdk shreesh...@gmail.com wrote: Zdenko, You wrote: He can create another data and use it together with data provided by google. Does this mean that we can use the ability of tessearct to use multiple languages for recognition to use multiple traineddata

Re: Hindi training data - unicharset_extractor error

2013-04-17 Thread zdenko podobny
On Wed, Apr 17, 2013 at 10:41 PM, Sven Pedersen sven.peder...@gmail.comwrote: Rob, You can add fonts to existing languages. Just follow the combine instructions. As far as I know, it is not possible. He can create another data and use it together with data provided by google. Sven On

Re: Hindi training data - unicharset_extractor error

2013-04-17 Thread zdenko podobny
On Wed, Apr 17, 2013 at 10:36 PM, Robert Komar rko...@telus.net wrote: On Wed, 17 Apr 2013, Sven Pedersen wrote: This is covered in theFAQ:https://code.google.** com/p/tesseract-ocr/wiki/FAQ#**How_https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_

Re: Problem re-creating user-words example in tesseract1 doc

2013-04-12 Thread zdenko podobny
On Fri, Apr 12, 2013 at 3:10 AM, u20...@gmail.com wrote: Note that there still appears to be a problem with the bazaar example: Even though the normal dictionary is supposed to be supressed and the user wordlist used instead, the whole text in eurotext.tif is still returned, including words

Re: New line recogniztion

2013-04-06 Thread zdenko podobny
On Fri, Apr 5, 2013 at 11:20 PM, Ruud van Houtum ruudvhou...@gmail.comwrote: Hello, I am using Tesseract to output text files from scanned documents. All text images contain typed text and are fairly clear/clean. So far Tesseract has a pretty good accuracy and I am quite content. However

Re: Can I pass a langue file direct to init? Have any way?

2013-04-06 Thread zdenko podobny
folder exists) From my experience it fails without the /. Patrick On Fri, Apr 5, 2013 at 6:07 PM, zdenko podobny zde...@gmail.com wrote: I did not test the latest code but in past I have these experiences: - if TESSDATA_PREFIX environment variable is specified than init path

Re: hOCR output and ocr_carea

2013-04-06 Thread zdenko podobny
Thanks for idea. I will try to have a look on it. If anybody has patch ready I will welcome it warmly Zdenko On Tue, Apr 2, 2013 at 7:16 AM, Janusz S. Bień jsb...@mimuw.edu.plwrote: The hOCR specification states that ocr_carea is content area which used to be called ocr_column. I've

Re: Can I pass a langue file direct to init? Have any way?

2013-04-05 Thread zdenko podobny
I did not test the latest code but in past I have these experiences: - if TESSDATA_PREFIX environment variable is specified than init path was ignored - if TESSDATA_PREFIX is build-in (default for autotools compilation) than init path was ignored The easy workaround for that problem:

Re: Terrible error with the Init() and with the sample .exe too

2013-04-04 Thread zdenko podobny
On Thu, Apr 4, 2013 at 4:18 AM, Damiano Rodriguez damiano...@gmail.comwrote: Hi all, I have a very strange problem: First of all: with visual studio 2010 and C# there was another problem because i was build the project for the framework 4.0 I have changed it to 2.0. Everithing is OK,

Re: link to tesseract on UBUNTU (Shared Lib .so) how?

2013-04-04 Thread zdenko podobny
On Wed, Apr 3, 2013 at 8:04 PM, Renato Forti rtfo...@gmail.com wrote: Hi, How I shoud link tesseract so on my app? My App crash with this: release/parallel/ocr/doksafe_ocr_engine: symbol lookup error: ./behavior/tesseract/libocr_default_engine.so: undefined symbol :

Re: Application link problem (link with shared library /usr/local/lib/libtesseract.so.3)

2013-04-04 Thread zdenko podobny
On Wed, Apr 3, 2013 at 8:40 PM, Renato Forti rtfo...@gmail.com wrote: Hi, I am trying use tesseract on my app. I did link my app with: OS: Linux (UBUNTU) gcc 4.6 tesseract ** ** from ls /usr/local/lib/*tesseract* /usr/local/lib/libtesseract.a

Re: Load std::vectorchar on Pix?

2013-04-04 Thread zdenko podobny
If you are no linux you can create PIX with pixReadMem. If you are on Mac, Windows you will face problem (e.g. tif will works, other formats like jpeg or png not). See leptonica issue 77[1] for more details. There is also test case/example file[2] where I used std::vectorchar to create PIX. [1]

Re: How can Tesseract Recognize text in box?

2013-04-04 Thread zdenko podobny
Hint: when you remove borders/boxes/table than tesseract does it job. So you will need some tool for removing lines (maybe good start could be line-removal leptonica[1]). Or if you are able to detect region of each number, do ocr number by number (with tesseract API or uzn files). [1]

Re: mftraining: symbol lookup error

2013-04-02 Thread zdenko podobny
On Tue, Apr 2, 2013 at 7:28 AM, Matt Ball matt.bal...@gmail.com wrote: Hello -- I get the following error when running mftraining: $ mftraining -F font_properties -U unicharset -O eng.unicharset eng.digital_dream.exp0.tr Read shape table shapetable of 11 shapes Reading

Re: Recognition of words with digits

2013-04-01 Thread zdenko podobny
On Sun, Mar 31, 2013 at 10:55 PM, mike_ro...@hotmail.com wrote: Try to turn off dictionaries (parameters load_system_dawg, load_freq_dawg, maybe also load_punc_dawg, load_number_dawg, load_unambig_dawg, load_bigram_dawg). You can do this only during init of language. I disabled all

Re: Car lisence plate recognition fail

2013-03-31 Thread zdenko podobny
On Sun, Mar 31, 2013 at 12:09 PM, Александр Жданов alekzande...@yandex.ruwrote: Hello I have problems with characters recognition using tesseract. So, I have got file with image of auto-detected car lisence plate. I have created it using OpenCV and it looks like this:

Re: Recognition of words with digits

2013-03-30 Thread zdenko podobny
Try to turn off dictionaries (parameters load_system_dawg, load_freq_dawg, maybe also load_punc_dawg, load_number_dawg, load_unambig_dawg, load_bigram_dawg). You can do this only during init of language. Zdenko On Fri, Mar 29, 2013 at 6:41 PM, mike_ro...@hotmail.com wrote: Hello all. I'm

Re: Setting up tesseract in visual studio 2010

2013-03-29 Thread zdenko podobny
Why are you using 3.01 instead of 3.02 (with installer) version??? there is installer and using VS2010 instead of VS2010 should not be big issue (if you are familiar with VS 2010)... Zdenko On Fri, Mar 29, 2013 at 5:51 AM, Buddhika De Seram w.dese...@gmail.comwrote: Hi, I'm new to

Re: A little assistance with an image

2013-03-28 Thread zdenko podobny
On Thu, Mar 28, 2013 at 1:56 AM, Nate Bennett nate.bennet...@gmail.comwrote: I am running the tesseractdotnet wrapper build 590 found on https://code.google.com/p/tesseractdotnet/. With the english3.01. it is quite old try to find way how to use 3.02 version - vietocr[1] has NET version maybe

Re: Problem re-creating user-words example in tesseract1 doc

2013-03-21 Thread zdenko podobny
Did you use environment setting TESSDATA_PREFIX ? If no, can you set it (to C:\Program Files (x86)\Tesseract-OCR\)? Zdenko On Thu, Mar 21, 2013 at 2:08 AM, u20...@gmail.com wrote: Thanks for the reply. Yes, the file does exist, I can open it from my working directory using

Re: Problem re-creating user-words example in tesseract1 doc

2013-03-20 Thread zdenko podobny
On Wed, Mar 20, 2013 at 2:35 AM, u20...@gmail.com wrote: I created the three files described in Section CONFIG FILES AND AUGMENTING WITH USER DATA of * http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html*http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html

Re: Error opening unicharset file

2013-03-17 Thread zdenko podobny
On Sun, Mar 17, 2013 at 4:00 AM, Epix Zhang exzh...@gmail.com wrote: Hello, I followed the instruction on https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3. But when it came to the step Putting it all together, errors occured. The command I run: combine_tessdata.exe chi. It

Re: use tesseract api in visual c++ 2010

2013-03-12 Thread zdenko podobny
download https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02.02.tar.gz Documentation was prepared before releasing tesseract 3.02... I will fix it. Zdenko On Tue, Mar 12, 2013 at 3:43 PM, Vicky Patil timepassv...@gmail.com wrote: Hi, I followed instructions

Re: Error when using trained language file: tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file adaptmatch.cpp, line 555 - Tesseract 3.02

2013-03-05 Thread zdenko podobny
There are no atttached data. Maybe try to use some online storage system (google disk, skydrive, dropbox...) and send a link here. You stated you are following wiki instruction[1], but you log shows it is not true - you did not run mftraining. [1]

Re: Error in training OCR

2013-03-01 Thread zdenko podobny
On Fri, Mar 1, 2013 at 7:33 AM, Hadid Mubarak hadidmubarak.jt...@gmail.comwrote: hi.. I have same issue with you.. is anyone managed to solve them? please the advice.. thank you so much.. Pada Senin, 11 Februari 2013 18:49:54 UTC+7, OCR explorer menulis: hi... I am using Serak tesseract

Re: tesseract testing suite

2013-02-24 Thread zdenko podobny
On Sun, Feb 24, 2013 at 12:20 AM, Nick White nick.wh...@durham.ac.ukwrote: On Fri, Feb 22, 2013 at 03:20:49PM +, Nick White wrote: On Sun, Jun 03, 2012 at 10:27:23PM +0100, zdenko podobny wrote: it looks like it is ASCII only oriented (at least in report non-ASCII are malformed

Re: What is a box file ?

2013-02-14 Thread zdenko podobny
On Thu, Feb 14, 2013 at 2:44 AM, iccol...@ncsu.edu wrote: I'm having a similar problem. The .tif file can be found here( https://www.dropbox.com/s/jivtydjj9gilkku/eng.ANSI_GDT.exp0.tif ) and I hope anyone can help. I'm trying to train tesseract to understand geometric tolerance symbols. I

Re: Tesseract Currency Extraction From Image Issue

2013-02-14 Thread zdenko podobny
As usual: - try to reproduce problem with tessract executable if you use something else (wrapper, in some cases API) - sent input image Zdenko On Thu, Feb 14, 2013 at 5:13 PM, Markus Austin markus2k...@gmail.comwrote: Hi All, I currently have Tesseract implemented within a PERL

Re: Tesseract Box Editor

2013-02-04 Thread zdenko podobny
Thanks - I will include it in next wiki update. I sent this cc to the tesseract forum, so other can enjoy this new tools too. Zdenko On Mon, Feb 4, 2013 at 6:27 PM, Scott Stringham scott.string...@gmail.comwrote: I have posted a new tesseract box file editor online at

Re: Issues with shapeclustering

2013-02-03 Thread zdenko podobny
You don't need to edit it. Just run command as on wiki. If is faster than editing tr file... Zdenko On Sun, Feb 3, 2013 at 12:21 AM, Carlos Antunes cf.antu...@gmail.comwrote: Zdenko, Shall I edit it and remove it before going further? Thanks. On Saturday, February 2, 2013 1:53:33 PM

Re: Tiff support for tesseract 3.02 on Ubuntu 12.04

2013-02-03 Thread zdenko podobny
Can you send and example of you tif file? Zdenko On Sun, Feb 3, 2013 at 10:08 PM, Michael Lissner mliss...@michaeljaylissner.com wrote: I have Ubuntu 12.04, which has tesseract 3.02 and leptonica version 1.69. I've installed these, and also installed libtiff4 using apt-get. When I try to

Re: Tiff support for tesseract 3.02 on Ubuntu 12.04

2013-02-03 Thread zdenko podobny
, Feb 3, 2013 at 1:16 PM, zdenko podobny zde...@gmail.com wrote: Can you send and example of you tif file? Zdenko On Sun, Feb 3, 2013 at 10:08 PM, Michael Lissner mliss...@michaeljaylissner.com wrote: I have Ubuntu 12.04, which has tesseract 3.02 and leptonica version 1.69. I've installed

Re: Tiff support for tesseract 3.02 on Ubuntu 12.04

2013-02-03 Thread zdenko podobny
://stackoverflow.com/questions/5083492/problem-with-tesseract-and-tiff-format Zdenko On Sun, Feb 3, 2013 at 11:00 PM, zdenko podobny zde...@gmail.com wrote: Are you able to generate just one page or small example? Or can you provide step how you create it (so I can create it)? Tiff could be tricky. E.g

Re: Issues with shapeclustering

2013-02-02 Thread zdenko podobny
Don't sent gdb output - it is useless. Especially when you do not follow wiki: you run: tesseract eng.20centsmarker.exp0.tif eng.20centsmarker.exp0.box nobatch box.train and you should run: tesseract eng.20centsmarker.exp0.tif eng.20centsmarker.exp0 nobatch box.train Zdenko On Fri,

Re: Issues with shapeclustering

2013-02-02 Thread zdenko podobny
But if you have a look at tr file, you will see that font name will be 20centsmarker.exp0. And I guess this is not want you want. Tesseract tooks some information from filenames. If you go your own way with naming you will face a problem (crash). I remember there is crash at some stage if last

Re: OCR Text in Specific Color

2013-01-28 Thread zdenko podobny
Tesseract converts input image data to 2 colors mode (black white). So it do no have information (at the output stage) about color of the input symbols... Zdenko On Sun, Jan 27, 2013 at 10:52 PM, ipec...@gmail.com wrote: Im new to the community but did some searching around before posting.

Re: Issues creating clustering data with mftraining

2013-01-28 Thread zdenko podobny
On Mon, Jan 28, 2013 at 12:01 PM, Nick White nick.wh...@durham.ac.ukwrote: On Mon, Jan 28, 2013 at 11:57:41AM +0100, zdenko podobny wrote: So try to read wiki ;-) and his e-mail. Indeed he already recognize his problem: Shall I also generate the shapetable as well. Well, I will try

Re: Recognising Known Characters

2013-01-28 Thread zdenko podobny
Hi, I remember there is (was) issue that mentioned this problem (B vs 8). So maybe this is common problem (some O vs 0 or l vs. 1)... I you post example image I can try to make some tests (later)... Zdenko On Mon, Jan 28, 2013 at 9:30 AM, jacob.chi...@gmail.com wrote: I followed the thread

Re: Can't combine tessdata files for training due to new line in unicharambigs

2013-01-25 Thread zdenko podobny
Post somewhere your files you try to combine. Zdenko On Fri, Jan 25, 2013 at 5:48 PM, Alp Oktem alpok...@gmail.com wrote: *What steps will reproduce the problem?* 1. Prepared all necessary files for training 2. combine_tessdata ./lang. *What is the expected output? What do you see

Re: Using tesseract for read ZONES

2013-01-21 Thread zdenko podobny
tesseract tries to open uzn files for defined[1] page segmentation modes (from 4 to 10 and 0[2] or other way: it does not use them for modes that request for automatic page segmentation). In attachment you can find example of image + uzn file from isri-ocr-evaluation-tools[3]. You can test it

Re: Cannot make box file

2013-01-17 Thread zdenko podobny
1. I can not find box.nochop mentioned on wiki (but maybe I am too tired at the moment ;-)). Can you provide link and paragraph? 2. Why are you using 2.x version? It is too old... Zdenko On Thu, Jan 17, 2013 at 5:21 PM, Firas almannaa firas.alman...@gmail.comwrote: On Thursday, 28

Re: How training language like arab?

2013-01-17 Thread zdenko podobny
Regarding cube: - there are no more public information about cube than that 92 hits at the forum I mentioned already (+ source code ;-)) - there are no information how to create cube data files (ok some of them are text files...) So you can: 1. try to use/train tesseract without

Re: How training language like arab?

2013-01-16 Thread zdenko podobny
Really ;-)? I got 93 results. E.g.: https://groups.google.com/forum/#!msg/tesseract-ocr/0msQtTB_XrI/D1noel9GpPgJ https://groups.google.com/d/topic/tesseract-ocr/tyV5_z65XMk/discussion https://groups.google.com/d/msg/tesseract-ocr/R7UCx0oV3PA/GE7KJ_76kS0J Please honor time of people on this

Re: How to use for Delphi 7?

2013-01-16 Thread zdenko podobny
Than it means that nobody create it. You are welcome to create it. Zdenko On Wed, Jan 16, 2013 at 5:49 PM, Sergey Kondratiev gorilovi...@gmail.comwrote: Where i can download this component for delphi? i can't find it(( -- You received this message because you are subscribed to the Google

Re: How training language like arab?

2013-01-16 Thread zdenko podobny
On Wed, Jan 16, 2013 at 3:34 PM, Sven Pedersen sven.peder...@gmail.comwrote: The reason why Arabic has those files and your language does not is that Arabic is set up to use the cube feature to combine it with other languages, so you can do -l ara+eng and OCR a document with both Arabic and

Re: Problems recognizing digits with Tesseract

2013-01-15 Thread zdenko podobny
Hi, first of all: Do not send executable to forum. Or do you think people here have no excess to tesseract executable? Next: - What is purposed of using e.g. segmentation Treat the image as a single word in a circle. (9) for your images? - Your images looks like postprocessed me. Is it

Re: Installation Issues with ./configure on ubuntu

2013-01-15 Thread zdenko podobny
12.04 as well as 12.10. If the answer is yes then I wanted to install tesseract in ubuntu 12.10 for hands on experience - since I am newbie to ubuntu. With Regard, -Sriranga On Tue, Jan 15, 2013 at 1:23 PM, zdenko podobny zde...@gmail.com wrote: Sriranga, this has noting to do with Ubuntu

Re: How training language like arab?

2013-01-15 Thread zdenko podobny
search archive of tesseract forums for cube. Zdenko On Tue, Jan 15, 2013 at 2:16 PM, gold snake huangjin...@gmail.com wrote: My language some special, just like arab font, but bitween arab font have some different, actually only different on shape of the font. and It's writing right to left

Re: Tutorial : Equation OCR using OpenCV to train and extract contours from image for OCR

2013-01-14 Thread zdenko podobny
On Mon, Jan 14, 2013 at 6:24 AM, Michael Young michaelyoung1...@gmail.comwrote: Extracting contours: http://ayoungprogrammer.blogspot.ca/2013/01/equation-ocr-part-1-using-contours-to.html Training Tesseract:

Re: Installation Issues with ./configure on ubuntu

2013-01-14 Thread zdenko podobny
Sriranga, this has noting to do with Ubuntu (or any other operating system). This is related to user experience and understanding of error messages. This is not problem. Problem is that he did not try so solve this common (not tesseract related) problem by himself. When I put to google checking

Re: Training Tesseract for single digit

2013-01-13 Thread zdenko podobny
Hi, I think you will need to run training for this. I tried simple c++ code that show confidence values (see attachment) and for your digit 6 it produced: symbol 5, conf: 78.5236 5 conf: 78.523613 s conf: 77.376984

Re: Tesseract for text detection

2012-12-29 Thread zdenko podobny
try to use page segmentation mode. E.g. Treat the image as a single word (or text line or uniform block of text) will produce results. As far as I remember discussion on this forum tesseract is not suitable for handwritten text... Zdenko On Fri, Dec 28, 2012 at 11:55 PM, Nick Jalbert

Re: Cube training data

2012-12-25 Thread zdenko podobny
On Tue, Dec 25, 2012 at 3:41 AM, Patrick Questembert patrick.questemb...@gmail.com wrote: The major languages such as English, French and Spanish come with a cube version of the training data (e.g. eng.cube.*). So far we have used only the regular training data (e.g. eng.traineddata). Can

Re: Tesseract 3.02 vs 3.01 performance

2012-12-23 Thread zdenko podobny
On Thu, Dec 20, 2012 at 3:25 PM, Patrick Questembert patrick.questemb...@gmail.com wrote: Update: the Suzuki cook-book for building on iOS still works, see https://tinsuke.wordpress.com/2011/11/01/how-to-compile-and-use-tesseract-3-01-on-ios-sdk-5/ About performance: we have observed only a

Re: Use tesseract to retrieve image skew orientation

2012-12-18 Thread zdenko podobny
In case of tesseract - have a look at PageIterator and AnalyseLayout(). Example code you can at this forum archive[1]. [1] https://groups.google.com/d/msg/tesseract-ocr/25GQVGvEE2g/HCKmB7LOplkJ Zdenko On Tue, Dec 18, 2012 at 11:15 AM, José Luis Rey jluis...@gmail.com wrote: Hello Friends,

Re: errors while building libtesseract.lib, How to solve the problem?

2012-12-18 Thread zdenko podobny
On Tue, Dec 18, 2012 at 4:34 AM, Iris hongyujiei...@gmail.com wrote: Hi, Zdenko. The same problem happened with me. It's weird to see syntax errors because I haven't modified anything. I'm wondering if you have solved the problem? I do not have this problem so there is nothing I can

Re: Chinese Simplified on this image not working

2012-12-18 Thread zdenko podobny
What kind of OS you use, what version of tesseract etc... I tried tesseract original.jpg original -l chi_tra and tesseract preprocessed.tiff preprocessed -l chi_tra and I did not get any error message (on openSUSE linux 64bit 12.2 with tesseract 3.02.02)... Why did you upscale image?

Re: Chinese Simplified on this image not working

2012-12-18 Thread zdenko podobny
I do apologize, but I am not familiar with Chinese (or other Asian languages ;-) ). So I tried tesseract original.jpg original -l chi_sim and the message was: Too many unichars in ambiguity on line 0 Too many unichars in ambiguity on line 0 Tesseract Open Source OCR Engine v3.02.02 with

Re: tesseract for new language

2012-12-16 Thread zdenko podobny
On Sun, Dec 16, 2012 at 9:01 AM, thomas nyan...@gmail.com wrote: Dear All, Is it possible to use tesseract for the new language? If so , how can I start? What about reading available docs in wiki? Zdenko -- You received this message because you are subscribed to the Google Groups

Re: Appending Output to single file

2012-12-15 Thread zdenko podobny
On Fri, Dec 14, 2012 at 8:01 PM, Alexis ya...@antonakis.co.uk wrote: I have a number of PDF files I am trying to OCRI have a script which extracts each page into individual .tif files which I then run through tesseract, and everything works fine However I am trying to output these pages

Re: Multiple Input of Tif

2012-12-14 Thread zdenko podobny
If you run 'tesseract --help' (or just 'tesseract') you could see that tesseract expects some structure of input arguments (e.g. one image file and one output file), so usage of wildcards will not work (if wildcards match to more than 1 file) neither on linux/unix nor windows (there is different

Re: problem with LED-fonts recognition ;(

2012-12-06 Thread zdenko podobny
On Wed, Dec 5, 2012 at 11:02 AM, mike oldfield czandrasze...@gmail.comwrote: Tesseract-ocr has still problem with decoding of LED-like digits. I made something like this in my squeeze comand line: convert 1.jpg 1.tif tesseract 1.tif 1.txt nobatch digits ...but effects are very poor and far

Re: problem with LED-fonts recognition ;(

2012-12-04 Thread zdenko podobny
Search forum. I remember discussion about similar topic. AFAIR: tesseract has problem with letter(symbol) that consists of several not connected parts (e.g. dots, lines) - solution should be to preprocess image (blur). Generally: black background is problem. Quality of image is too low (JPEG,

Re: Tesseract ocr to XML with text positions (X and Y)

2012-12-03 Thread zdenko podobny
On Mon, Dec 3, 2012 at 9:19 AM, Benito2313 benito2...@hotmail.com wrote: Op maandag 3 december 2012 08:48:20 UTC+1 schreef zdenop het volgende: On Sun, Dec 2, 2012 at 11:56 AM, Nick White nick@durham.ac.ukwrote: On Sun, Dec 02, 2012 at 01:29:54AM -0800, Benito2313 wrote: Thank you

Re: Tesseract OCR 3.02 .NET (TessNET2) library crashes in sample programs in ocr.Init(null, eng, false);

2012-12-02 Thread zdenko podobny
On Fri, Nov 30, 2012 at 10:10 PM, eljainc elja...@sbcglobal.net wrote: My mistake, It was the 2.0.4 version. I am still not sure where these English files should be. I have tried to put them into a temp location using ocr.Init(o:\\ocrtemp\\,eng,false); I have also tried to put them in the

Re: Tesseract ocr to XML with text positions (X and Y)

2012-12-02 Thread zdenko podobny
On Sun, Dec 2, 2012 at 11:56 AM, Nick White nick.wh...@durham.ac.uk wrote: On Sun, Dec 02, 2012 at 01:29:54AM -0800, Benito2313 wrote: Thank you for your reply, i cant fine the manual page of tesseract could you post à link? http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html

Re: tesseract api and user-words

2012-11-30 Thread zdenko podobny
I guess there is problem to find deu.traineddata. I would suggest to run your program in console, so you can see possible error message (something like Error opening data file C:\Program Files\Tesseract-OCR\tessdata/deu.traineddata). Another option is to init tesseract and set variables in more

Re: Tesseract OCR 3.02 .NET (TessNET2) library crashes in sample programs in ocr.Init(null, eng, false);

2012-11-30 Thread zdenko podobny
On Thu, Nov 29, 2012 at 10:04 PM, eljainc elja...@sbcglobal.net wrote: Hello, I'm using the TessNet2 (.NET) library version 3.02 and I'm having an issue in running my first ever program with Tesseract OCR. Can you be please more specific, what do you mean with TessNet2 (.NET) library version

Re: Having problem to use Generic Vector

2012-11-30 Thread zdenko podobny
It looks like you have problem with linking library. I modified code of example (tesseract-ocr-API-Example-vs2008.ziphttp://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-API-Example-vs2008.zip) and code compiled ok for me (in VS 2008)... -- Zdenko On Wed, Nov 28, 2012

Re: tesseract api and user-words

2012-11-30 Thread zdenko podobny
I put this code to tesseract-ocr-API-Example-vs2008.ziphttp://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-API-Example-vs2008.zip : Pix *image; char *outText; char *configs[]={myconfig}; int configs_size = 1; TessBaseAPI *tess = new TessBaseAPI();

Re: Deleting 'comments' from wiki pages

2012-11-29 Thread zdenko podobny
, zdenko podobny zde...@gmail.com wrote: I agree. But only project owners can do it ;-) And as far as I tested it (on one project where I am owner) - he has to delete them one by one :-) -- Zdenko On Wed, Nov 28, 2012 at 1:06 PM, Nick White nick.wh...@durham.ac.ukwrote: Hi Tesseractors, I

Re: Deleting 'comments' from wiki pages

2012-11-28 Thread zdenko podobny
I agree. But only project owners can do it ;-) And as far as I tested it (on one project where I am owner) - he has to delete them one by one :-) -- Zdenko On Wed, Nov 28, 2012 at 1:06 PM, Nick White nick.wh...@durham.ac.uk wrote: Hi Tesseractors, I just posted an issue to remove all the

Re: Using tesseract with VS2012. HELP PLEASE

2012-11-28 Thread zdenko podobny
I do not have experience with vs2012 but I would suggest: 1. ask for help on Microsoft Developer Network - they should provide you (at least general) instruction about this topic (maybe there is compatibility option) 2. Try to go step by step with vs2012 build: 1. if you have

Re: Using tesseract with VS2012. HELP PLEASE

2012-11-27 Thread zdenko podobny
And what about installing Microsoft Visual C++ 2008 SP1 Redistributable Package (x86)? In release notes[1] it is suggested for executables... [1] *https://groups.google.com/d/topic/tesseract-ocr/EXyGqT9osrw/discussion* * * On Tue, Nov 27, 2012 at 5:15 AM, Minjie Zheng zmin...@gmail.com wrote:

Re: weird : Hidillfllflhfiilfl

2012-11-25 Thread zdenko podobny
On Sun, Nov 25, 2012 at 1:57 PM, a314 ah.mas...@gmail.com wrote: With a simple input .tif file (attached) that containts a very-readable text Her old man will be jealous! in one line, my ouput text file shows: Hidillfllflhfiilfl. I spent quite a lot of time to build (windows 7, visual C++ 2010)

Re: How to build the tesseract 3.02.02 project in Eclipse at Ubuntu?

2012-11-20 Thread zdenko podobny
On Mon, Nov 19, 2012 at 5:57 PM, Linda Li codingpotatoli...@gmail.comwrote: Thanks. On Monday, November 19, 2012 10:30:07 AM UTC-6, zdenop wrote: On Sun, Nov 18, 2012 at 11:49 PM, Linda Li codingpo...@gmail.com wrote: Now building succeeds. Compile has errors, complaining there are

Re: List of all variables settable by TessBaseAPI::SetVariable()

2012-11-20 Thread zdenko podobny
If you are interested in all (648) tesseract-ocr 3.02 parameters(variables) with default values have a look at http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version -- Zdenko On Tue, Nov 20, 2012 at 2:10 PM, ArtooDetoo artoodeto...@gmail.com wrote: :

Re: How to build the tesseract 3.02.02 project in Eclipse at Ubuntu?

2012-11-19 Thread zdenko podobny
On Sun, Nov 18, 2012 at 11:49 PM, Linda Li codingpotatoli...@gmail.comwrote: I build it to run tessearctmain.cpp. I do not intend to compile the whole Thanks to instruction from Zdenko, I add the symbols as follows: In Eclipse, Project Properties-C/C++ General-Paths and Symbols Symbol,

Re: How to build the tesseract 3.02.02 project in Eclipse at Ubuntu?

2012-11-19 Thread zdenko podobny
On Mon, Nov 19, 2012 at 9:36 PM, Linda Li codingpotatoli...@gmail.comwrote: Hah, I figured it out. You are right, undefined does not mean undeclared. So I think I found a one wrong lib. I checked the Makefile, although a lot of strange words there, there are LIBS = -llept -lpthread So I

Re: How to build the tesseract 3.02.02 project in Eclipse at Ubuntu?

2012-11-18 Thread zdenko podobny
You should give definition to compiler. Have a look how it is done with autotools (or VC++ solution if you are familiar with it). I am not eclipse user (even I tried once to open and compile tesseract project in it. It was smooth as far as I remember) but I would expect that eclipse is able to

Re: Running OSD support of Tesseract OCR

2012-11-18 Thread zdenko podobny
Hi all, you will not get OSD (Orientation and script detection) output information with tesseract executable. At the moment tesseract provide (save) only ocr result. Somebody could consider help (tesseract --help) misleading because it enumerate all possible page segmentation modes. I think that

[Announcement] QT Box Editor 1.10

2012-11-16 Thread zdenko podobny
QT Box Editor 1.10 was released. It is a multi-platform visual editor for tesseract-ocr http://code.google.com/p/tesseract-ocr/ box fileshttp://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 (used for OCR training) based on QT4 library http://qt.nokia.com/products/. Several problems were

Re: Tesseract Forms Recognition,

2012-11-16 Thread zdenko podobny
On Fri, Nov 16, 2012 at 2:33 PM, José Luis Rey jluis...@gmail.com wrote: Thanks for the response, I've read the Apache license 2.0 and all looks ok. ** ** The software I'm developing is for scanread documents like invoices , checks, and any document width fixed fields, linking zones

Re: Problem with ViewerDebugging with tesseract 3.02.02

2012-11-16 Thread zdenko podobny
On Fri, Nov 16, 2012 at 3:05 AM, Linda Li codingpotatoli...@gmail.comwrote: Version: tesseract 3.02.02 Ubuntu 12.04, Eclipse Juno I am trying to use ViewerDebugging. Following the instructions in http://code.google.com/p/tesseract-ocr/wiki/ViewerDebugging I installed javac download

Re: How to build the tesseract 3.02.02 project in Eclipse at Ubuntu?

2012-11-16 Thread zdenko podobny
On Fri, Nov 16, 2012 at 3:03 AM, Linda Li codingpotatoli...@gmail.comwrote: I want to build the tesseract 3.02.02 project so that I can modify some code to tune it to some specific task. Version: tesseract 3.02.02 Ubuntu 12.04, Eclipse Juno I put the tesseract into the Eclipse project.

Re: Confidence in HOCR file

2012-11-15 Thread zdenko podobny
On Thu, Nov 15, 2012 at 10:15 AM, José Luis Rey jluis...@gmail.com wrote: Thanks very much for your responses zdenop, I'm not used to dev in open source projects like this, perhaps you may help me to understand, for example if I implement a feature to add character rectconfidence to the

Re: Word Search Using Tessnet

2012-11-15 Thread zdenko podobny
On Fri, Nov 9, 2012 at 1:43 PM, Troy Frazier troypow...@gmail.com wrote: Is it possible to search an image for a particular word using the Tessnet wrapper? I see that it is possible to limit your scan to certain characters, but what I would like to do is to input a word and have all

Re: Working directories

2012-11-14 Thread zdenko podobny
On Wed, Nov 14, 2012 at 3:38 PM, José Luis Rey jluis...@gmail.com wrote: Opss I did not see this response, For my working dirs are the dirs pointed by the env var : %TESSERACT_DATA% I see that a need to compile to change the default config/dictionary to the correct windows vista/7 %AppData%

Re: Working directories

2012-11-14 Thread zdenko podobny
On Wed, Nov 14, 2012 at 4:12 PM, José Luis Rey jluis...@gmail.com wrote: You do not need to compile tesseract. Just set your TESSERACT_DATA to your tessdata directory (e.g. %AppData%) (before **calling tesseract)... That's all. Imaging that you are running on vista/7 as a regular user, you

Re: Character Segmentation

2012-11-13 Thread zdenko podobny
Are you able to program (C++)? -- Zdenko On Tue, Nov 13, 2012 at 6:17 AM, Walid Khedr khe...@gmail.com wrote: Hi, I'm new in tesseract. I just want to use it for Character Segmentation. The input is an image of a text string and the output will be an array of *images *for each character.

Re: Character Segmentation

2012-11-13 Thread zdenko podobny
You need to get box coordinates (BoundingBox) for each symbol[1]. Try to follow hocr algorithm within tesseract[2]. hocr is focusing on word/line but the logic would be the same for symbols (and it could be simplified). Or maybe search for character confidence in issues and forum. There should

Re: empty Page

2012-11-12 Thread zdenko podobny
On Mon, Nov 12, 2012 at 3:23 PM, Mi Tran nuon...@gmail.com wrote: What kind of your bmpFile? bmpFile must is *.tif This is not true - it can be any image type supported by leptonica. -- Zdenko -- You received this message because you are subscribed to the Google Groups tesseract-ocr

Re: Newbie: Training tesseract

2012-11-12 Thread zdenko podobny
check also error messages - if you did not run shapeclustering then mftraining should not produce any output (in 3.02 version) ;-) Also it looks like you forget to rename output files from training tools! You need to follow training wiki[1]! [1]

Re: My Post Didn't Show Up

2012-11-12 Thread zdenko podobny
As far as I know this list is moderated e.g. your first post should be approved by moderator... But I am not familiar with details (I am not moderator ;-) ) -- Zdenko On Mon, Nov 12, 2012 at 8:27 PM, Random Terrain replayabil...@randomterrain.com wrote: Does it take a while to show up or is

Re: Newbie: Training tesseract

2012-11-12 Thread zdenko podobny
If you are serious about your training project, please invest your time to read wiki (once again if necessary). It is there. -- Zdenko On Tue, Nov 13, 2012 at 1:20 AM, Mi Tran nuon...@gmail.com wrote: Thanks zdenop , I have ran shapeclustering and read training wiki. But it still has error.

Re: empty Page

2012-11-12 Thread zdenko podobny
Leptonica is library that handle images for tesseract. -- Zdenko On Tue, Nov 13, 2012 at 1:36 AM, MiT nuon...@gmail.com wrote: leptonica is tool support training? Vào 22:38:58 UTC+7 Thứ hai, ngày 12 tháng mười một năm 2012, zdenop đã viết: On Mon, Nov 12, 2012 at 3:23 PM, Mi Tran

Re: Training new languages

2012-11-08 Thread zdenko podobny
On Wed, Nov 7, 2012 at 11:25 PM, Donaldo donaldo@gmail.com wrote: Hi, Judith I tried using the Esperanto option with the Tesseract package distributed in Ubuntu 12.04 but it did not recognise any of the accented Esperanto letters. I have done some training on Esperanto texts and

Re: Specification of Multiple Languages

2012-11-08 Thread zdenko podobny
On Thu, Nov 8, 2012 at 2:00 PM, Tom Mc thomasmccot...@gmail.com wrote: Hi All, I have many documents that contain a mixture of two or more languages; Chinese and English for example. Is there a way to merge together two training files so the engine can interpret both character sets? Any

Re: Assert in GetUTF8Text(RIL_SYMBOL) tesseract 3.01

2012-11-08 Thread zdenko podobny
try the 3.02 - I tested it with VS2008 and mingw32 on Windows XP and there was no crash. -- Zdenko On Thu, Nov 8, 2012 at 10:54 PM, Mike Butterbrodt the.mik...@gmail.comwrote: I have an image snippet that causes an assertion in unichar_id() from a higher call to GetUTF8Text(RIL_SYMBOL)

<    6   7   8   9   10   11   12   13   14   >