[tesseract-ocr] Re: jTessBoxEditor - Tesseract box editor trainer

2014-11-10 Thread Quan Nguyen
into tiff. I also tried converting the png to a 8bpp grayscale but in vain. I am still struggling to see the image file in the JTessBoxEditor. Any help is appreciated. On Wednesday, September 25, 2013 10:02:13 PM UTC-4, Quan Nguyen wrote: jTessBoxEditor is a Java box editor for Tesseract OCR

Re: [tesseract-ocr] Re: jTessBoxEditor 0.6 Beta release

2014-11-10 Thread Quan Nguyen
with existing box or Train from Scratch under the *Traine*r tab i am getting this attached message. Question: How i can generate the Arabic.font_properties, Arabic.frequent_word_list and Arabic.words_list files using jtessbox editor? On Friday, 7 November 2014 19:42:37 UTC+5, Quan Nguyen wrote

[tesseract-ocr] Re: jTessBoxEditor 0.6 Beta release

2014-11-05 Thread Quan Nguyen
Yes. The latest version supports RTL languages. On Wednesday, November 5, 2014 4:38:13 AM UTC-6, iram akbar wrote: thank you Quan, jtessbox editor supports Arabic language? On Friday, 31 October 2014 04:18:00 UTC+5, Quan Nguyen wrote: You only need JRE http://www.oracle.com/technetwork

Re: [tesseract-ocr] How to run make training for Repo installed Tesseract 3.03

2014-11-05 Thread Quan Nguyen
I read Ubuntu 14.10 has Tesseract training executable. On Wednesday, November 5, 2014 7:41:12 AM UTC-6, shree wrote: Did you install the latest version from http://packages.ubuntu.com/utopic/tesseract-ocr If so, it should have the trainingtools. Try which text2image to see if it

Re: [tesseract-ocr] Strange regocnition

2014-11-01 Thread Quan Nguyen
The image is really small -- it needs 300 DPI. Nevertheless, VietOCR 4.0 beta, which uses Tesseract 3.03 RC, can pick it up without any problem. If you use the .NET version, be sure to scale the image first. On Saturday, November 1, 2014 3:56:16 AM UTC-5, Gennady Goncharov wrote: Thanks a

Re: [tesseract-ocr] Re: any chance to get this .tiff converted to text?

2014-10-31 Thread Quan Nguyen
The command syntax are: java -jar vietocr.jar -? vietocr -? The command-line mode does not support rescaling of images though. Use ImageMagick's *convert* command to rescale or resize. http://www.imagemagick.org/Usage/resize/ On Friday, October 31, 2014 2:42:43 AM UTC-5, boris wrote: OMG

[tesseract-ocr] Re: any chance to get this .tiff converted to text?

2014-10-30 Thread Quan Nguyen
Hi Boris, Be sure to select Screenshot Mode. The image has too low resolution. Quan On Wednesday, October 29, 2014 1:10:49 PM UTC-5, boris wrote: Hi Shree, I have changed language to German but it won´t realy improve. Anyhow, I am thinking of programming my own OCR for my project as I

[tesseract-ocr] Re: image processing to improve tesseract OCR accuracy

2014-10-30 Thread Quan Nguyen
The number was recognized after grayscale, binarize, and invert color steps. On Thursday, October 30, 2014 9:51:59 AM UTC-5, Rick Leir wrote: The simpler method: convert to greyscale then binarize with the appropriate threshold. However if the colors convert to similar grey values then you

[tesseract-ocr] Re: jTessBoxEditor 0.6 Beta release

2014-10-30 Thread Quan Nguyen
' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/mamata/tesseract-3.01' make: *** [all] Error 2 when i upgrade ubuntu to 13.04 On Monday, October 3, 2011 9:20:00 AM UTC+5:30, Quan Nguyen wrote: A box editor for Tesseract OCR data. This release includes the following

[tesseract-ocr] Re: Many 'question mark' chars in recognized text

2014-10-30 Thread Quan Nguyen
I suspect you have saved the Unicode text output with a wrong character encoding. Try UTF-8 encoding when you save the file. Tesseract may misrecognize the characters but rarely put question marks in their places. On Thursday, October 16, 2014 3:18:58 AM UTC-5, Salvo Piazza wrote: Hi all,

[tesseract-ocr] Re: Tesseract on simple image

2014-10-08 Thread Quan Nguyen
Rescaling to 300DPI would help. On Tuesday, October 7, 2014 10:19:30 PM UTC-5, Test wrote: Any pointers here? Maybe something to get me going on this simple image and then I can work on things a bit more complicated. On Tuesday, September 16, 2014 9:00:48 PM UTC-4, Test wrote: Hello,

[tesseract-ocr] Re: Is there a control param for tesseract to disable line breaks within a paragraph?

2014-09-18 Thread Quan Nguyen
, Bruce wrote: It works wonderfully.. can you explain more on the Regex statement? I can't understand what the first regex statement is matching against. Thanks again for sharing your wonderful solution!! On Saturday, August 9, 2014 9:32:00 AM UTC+8, Quan Nguyen wrote: It employs a proper

[tesseract-ocr] Re: Is there a control param for tesseract to disable line breaks within a paragraph?

2014-09-18 Thread Quan Nguyen
UTC+8, Quan Nguyen wrote: It employs a proper Regex statement. Following is the function in Java that it uses: /** * Removes line breaks. * @param text * @return */ public static String removeLineBreaks(String text) { return text.replaceAll((?=\n|^)[\t

[tesseract-ocr] Re: [Clarification request] Is it possible to let Tesseract generate three output files i) text ii) hOCR iii) PDF in a *single* run ?

2014-09-16 Thread Quan Nguyen
You can use the new ResultRenderer API in v3.03 to generate different output formats simultaneously. On Tuesday, September 16, 2014 3:03:22 AM UTC-5, Tom wrote: I wish to generate the three meaningful output formats, at least hOCR and PDF, in one run (call). Questions: 1. Is it

Re: [tesseract-ocr] Re: [Clarification request] Is it possible to let Tesseract generate three output files i) text ii) hOCR iii) PDF in a *single* run ?

2014-09-16 Thread Quan Nguyen
On Wed, Sep 17, 2014 at 7:03 AM, Quan Nguyen nguy...@gmail.com javascript: wrote: You can use the new ResultRenderer API in v3.03 to generate different output formats simultaneously. On Tuesday, September 16, 2014 3:03:22 AM UTC-5, Tom wrote: I wish to generate the three meaningful output

Re: [tesseract-ocr] Re: read_params_file

2014-09-12 Thread Quan Nguyen
You would have to use a separate command for each image, or you can combine them into a multi-page TIFF image. On Friday, September 12, 2014 8:54:41 AM UTC-5, Dovhani Foneworx wrote: Hi Zdenko, I have more than 15 images that I want to train however I have used imagemaic to join the images

[tesseract-ocr] Re: Automatic Alignment of Scanned Pages

2014-09-12 Thread Quan Nguyen
Search for leptonica deskew in the Groups. On Wednesday, September 10, 2014 11:41:44 AM UTC-5, Philipp Dörfler wrote: Hi, Does Tesseract have a module to automatically align and straighten out a scanned sheet of paper? If it does, can someone please point me to the code that is

[tesseract-ocr] Re: problem in processing Hindi pdf

2014-09-04 Thread Quan Nguyen
If you have an issue with VietOCR, post at its Discussion Forum http://sourceforge.net/p/vietocr/discussion/. On Thursday, September 4, 2014 1:41:39 AM UTC-5, V S Rawat wrote: Same problem is coming in 4.0 beta in processing English pdfs (source not known). Loading and displaying correctly,

Re: [tesseract-ocr] Training tesseract 3.03 in a custom C and C++ code using C-API

2014-09-03 Thread Quan Nguyen
I don't think the training functions are exposed as API. Other developers can confirm on this. On Wednesday, September 3, 2014 4:24:11 AM UTC-5, Dovhani Foneworx wrote: Thanks his source is written in Java and I am writing C program and I need to do training as in the

[tesseract-ocr] Re: suppress the debug output info of Tesseract c++ API

2014-09-03 Thread Quan Nguyen
it doesn't work the way it is supposed to here. On Wednesday, September 3, 2014 11:49:07 AM UTC+8, Quan Nguyen wrote: Try api.setVariable(debug_file, tesseract.log); On Tuesday, September 2, 2014 10:01:38 AM UTC-5, Derry Pei wrote: Hi all, I am currently building my own application

[tesseract-ocr] Re: suppress the debug output info of Tesseract c++ API

2014-09-02 Thread Quan Nguyen
Try api.setVariable(debug_file, tesseract.log); On Tuesday, September 2, 2014 10:01:38 AM UTC-5, Derry Pei wrote: Hi all, I am currently building my own application with Tesseract C++ API in Windows 7, and when i called recognition function: *char * text = api.GetUTF8Text();* I got the

[tesseract-ocr] Re: jTessBoxEditor - Tesseract box editor trainer

2014-08-19 Thread Quan Nguyen
Version 1.1 beta is released with the following enhancements: - Add training support for Right-to-Left (RTL) text - Add horizontal box split using modifier keys Any comments/feedback are welcome. Thanks. On Wednesday, September 25, 2013 9:02:13 PM UTC-5, Quan Nguyen wrote: jTessBoxEditor

[tesseract-ocr] Re: Language file for MICR font

2014-08-14 Thread Quan Nguyen
I simply copied the `mcr.traineddata` file into `tessdata` and specify `-l mcr` at the command line. On Thursday, August 14, 2014 4:36:00 AM UTC-5, Juned Khan wrote: Hi Anurag, What else you did to make this working ? I have copied mcr.traineddata shared by Quan in appropriate directory but

[tesseract-ocr] Re: Language file for MICR font

2014-08-14 Thread Quan Nguyen
I simply copied the mcr.traineddata file into tessdata and specify -l mcr at the command line. On Thursday, August 14, 2014 4:36:00 AM UTC-5, Juned Khan wrote: Hi Anurag, What else you did to make this working ? I have copied mcr.traineddata shared by Quan in appropriate directory but

[tesseract-ocr] Re: How to get tables ocr-ed

2014-08-10 Thread Quan Nguyen
Table is a known limitation of Tesseract OCR engine. If you know how to eliminate the table borders, you would get better results from Tesseract. On Sunday, August 10, 2014 9:17:07 AM UTC-5, V S Rawat wrote: We often get text in which images or pdf have tables. Text is in several columns,

[tesseract-ocr] Re: Beta download zip file doesn't have how to install instructions

2014-08-10 Thread Quan Nguyen
The Java version does not require installation.Just unzip and run ocr.bat. Version 4.0 beta is for new Tesseract 3.03 beta. The .NET version includes setup.msi installer. readme.txt is a standard filename defined by sf.net site for Download page to describe the files distributed on that page.

[tesseract-ocr] Re: Is there a control param for tesseract to disable line breaks within a paragraph?

2014-08-08 Thread Quan Nguyen
break. How did VietOCR solve this issue? On Thursday, August 7, 2014 7:23:27 AM UTC+8, Quan Nguyen wrote: I'm afraid not. You can use any programming editor that supports Regex find/replace to do it for you, or use a tool such as VietOCR http://vietocr.sf.net to remove line breaks from

Re: [tesseract-ocr] Re: Screenshots of applications

2014-08-08 Thread Quan Nguyen
:27 AM, Quan Nguyen nguy...@gmail.com javascript: wrote: If the coordinates of the rectangles are known, you can crop, and rescale to 300DPI if necessary, then send to Tesseract for recognition. On Wednesday, August 6, 2014 10:11:32 AM UTC-5, Natan Katz wrote: Paul Thanks for your answer

Re: [tesseract-ocr] Cannot open input file

2014-08-06 Thread Quan Nguyen
The command was incorrect -- that hyphen threw it off keel. If you could not get the command squared away, you can use VietOCR http://vietocr.sf.net, which supports Bulk OCR. On Wednesday, August 6, 2014 1:14:58 PM UTC-5, Morgan Boyd wrote: I am also receiving this error, despite adding the

[tesseract-ocr] Re: Is there a control param for tesseract to disable line breaks within a paragraph?

2014-08-06 Thread Quan Nguyen
I'm afraid not. You can use any programming editor that supports Regex find/replace to do it for you, or use a tool such as VietOCR http://vietocr.sf.net to remove line breaks from the output text. On Wednesday, August 6, 2014 10:51:34 AM UTC-5, Bruce wrote: For example with the image

[tesseract-ocr] Re: Screenshots of applications

2014-08-06 Thread Quan Nguyen
If the coordinates of the rectangles are known, you can crop, and rescale to 300DPI if necessary, then send to Tesseract for recognition. On Wednesday, August 6, 2014 10:11:32 AM UTC-5, Natan Katz wrote: Paul Thanks for your answer. Most of the pictures are of this form namely, when there

[tesseract-ocr] Re: Unable to load library 'libtesseract302': The specified module could not be found. error

2014-07-24 Thread Quan Nguyen
Check out the Tutorial: Development with Tess4J http://tess4j.sourceforge.net/tutorial/. You may need to set jna.library.path system property or change the Path environment variable. On Wednesday, July 23, 2014 8:06:18 AM UTC-5, Geetanjali P wrote: Hi All, I am having a similar issue.

[tesseract-ocr] Re: Unable to load library 'libtesseract302': The specified module could not be found. error

2014-07-23 Thread Quan Nguyen
or jna.library.path On Wednesday, July 23, 2014 5:23:54 PM UTC-5, Quan Nguyen wrote: Check out the Tutorial: Development with Tess4J http://tess4j.sourceforge.net/tutorial/. You may also need to set java.library.path or java.library.path variable. On Friday, January 18, 2013 8:11:56 AM

[tesseract-ocr] Re: Unable to load library 'libtesseract302': The specified module could not be found. error

2014-07-23 Thread Quan Nguyen
Check out the Tutorial: Development with Tess4J http://tess4j.sourceforge.net/tutorial/. You may need to set java.library.path system property or change the Path environment variable. On Friday, January 18, 2013 8:11:56 AM UTC-6, Deniz Atak wrote: Hi, I am trying to run Tess4J in 64 JVM

[tesseract-ocr] Re: tesseract 3.02 not able to retrieve text

2014-07-21 Thread Quan Nguyen
You'll need to pre-process the image. Grayscale: MRN x255 one w» AGE w,» Rescale to 300DPI: MRN: 1259 DOB: none AGE: none On Monday, July 21, 2014 7:34:29 AM UTC-5, Mustak M wrote: I am trying to retrieve the text from attached image, but tesseract tool is not retuning anything. I tried the

[tesseract-ocr] Re: Tess4J returns wrong font type

2014-07-18 Thread Quan Nguyen
They look alike. http://www.fonts2u.com/dejavu-sans-extralight.font http://www.fontsaddict.com/font/dejavu-sans-extralight.html On Friday, July 18, 2014 6:34:15 AM UTC-5, Mustak M wrote: I am using Java wrapper Tess4J. Using following code to retrieve the font type and font size from an

Re: [tesseract-ocr] JTessbox Modifying the boxes

2014-07-18 Thread Quan Nguyen
If you create training images using jTessBoxEditor, you can use the Letter Tracking function to control the spacing between characters. On Thursday, July 17, 2014 2:00:19 PM UTC-5, Jing JC wrote: yep yep. it happened during the bounding boxes I generated myself. not happened to the .box

[tesseract-ocr] Re: Error whilst using viet OCR (tesseract OCR interface) on mac

2014-06-29 Thread Quan Nguyen
Hi Jack, Have you installed Tesseract or compiled it on your OSX box? VietOCR requires an existing installation of Tesseract to run. Once installed, you can specify its location in VietOCR's setting. Quan On Saturday, June 28, 2014 1:57:12 PM UTC-5, Jack Kershaw wrote: Hi, I've got to

[tesseract-ocr] Re: How to draw the text objects identified by tesseract?

2014-06-10 Thread Quan Nguyen
No. Tesseract can supply the coordinates; your program would have to draw the boxes for them. On Tuesday, June 10, 2014 1:00:47 PM UTC-5, José Ricardo wrote: Hi, I'd like to know if it's possible to tell tesseract to draw the text objects on the page. It would be nice if I could generate

[tesseract-ocr] Re: Questions related to tesseract 3.03 PDF output feature

2014-06-02 Thread Quan Nguyen
Testing with Tess4J, I had no problem generating searchable PDFs for both images. I converted a sample TIFF to JPG and was able to create PDF from it as well. On Sunday, June 1, 2014 11:08:59 AM UTC-5, Bruce wrote: Hello, I'm testing tesseract's 3.03 PDF output feature on Android. My

[tesseract-ocr] Re: Runing Tess4J

2014-05-22 Thread Quan Nguyen
Looks like a problem with locale, which has recently been fixed. https://code.google.com/p/tesseract-ocr/issues/detail?id=910 https://code.google.com/p/tesseract-ocr/wiki/FAQ#Error:_Illegal_min_or_max_specification

[tesseract-ocr] Re: Tesseract 3.02 Orientation Script Detection

2014-05-14 Thread Quan Nguyen
was speaking regarding Tess4J. You can get the information of interest through the API. @zdenop So Tesseract v. 3.02 doesn't support this feature... I'll try 3.03 version! Many thanks! Il giorno domenica 11 maggio 2014 13:53:45 UTC+2, Quan Nguyen ha scritto: With psm 0, Tesseract does

[tesseract-ocr] Re: Tesseract 3.02 Orientation Script Detection

2014-05-11 Thread Quan Nguyen
With psm 0, Tesseract does not perform normal OCR function but analyzes layout; it produces such characteristics as Orientation, Writing Direction, and Textline Order. Check Tess4J unit tests for usage of OSD. On Sunday, May 11, 2014 5:48:39 AM UTC-5, Joe Aspara wrote: I'm struggling with the

[tesseract-ocr] Re: Tessract1 in multithreading

2014-05-08 Thread Quan Nguyen
Could they be bad images? Can you reprocess them in subsequent runs? On Wednesday, May 7, 2014 6:07:32 AM UTC-5, renjitha nair wrote: Hi, Try to execute a long list of files using tess4j with Tesseract1 in multithread environment [using eclipse jobs api ] , Then some of the files failed

[tesseract-ocr] Re: could you guide me to train tesseract in windows please?

2014-04-23 Thread Quan Nguyen
UTC-5, umcode wrote: :) ,thank you Quan Nguyen! i troubled with this program almost in every step . please help me go on what about this error ! C:\Program Files\Tesseract-OCRmftraining -F unicharset -O eng.unicharset eng.ar ial.exp0.tr eng.arial.exp1.tr Warning: No shape table

[tesseract-ocr] Re: could you guide me to train tesseract in windows please?

2014-04-22 Thread Quan Nguyen
UTC+3 tarihinde Quan Nguyen yazdı: Because the command is incorrect. It should be: shapeclustering -F font_properties -U unicharset eng.timesitalic.exp0.tr On Monday, April 21, 2014 10:23:37 AM UTC-5, umcode wrote: i have read the trainingtesseract3https://code.google.com/p/tesseract-ocr

[tesseract-ocr] Re: Training doesn´t work 3.02 (Commandcorrection??)

2014-04-22 Thread Quan Nguyen
The commands should be: tesseract deu.handwriting.exp0.tif deu.handwriting.exp0 box.train or tesseract deu.handwriting.exp0.tif deu.handwriting.exp0 box.train.stderr On Tuesday, April 22, 2014 8:07:35 AM UTC-5, Awsomo :( wrote: Hi, I have installed Tesseract v3.02 with Cowboxer as

[tesseract-ocr] Re: could you guide me to train tesseract in windows please?

2014-04-21 Thread Quan Nguyen
Because the command is incorrect. It should be: shapeclustering -F font_properties -U unicharset eng.timesitalic.exp0.tr On Monday, April 21, 2014 10:23:37 AM UTC-5, umcode wrote: i have read the trainingtesseract3https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3and install

[tesseract-ocr] Re: tesseract init function problem : TESSDATA_PREFIX environment

2014-04-07 Thread Quan Nguyen
What version of Tesseract are you testing? I believe the issue has been corrected for 3.03. https://code.google.com/p/tesseract-ocr/issues/detail?id=938 On Monday, April 7, 2014 5:14:21 PM UTC-5, Hsrt wrote: Hello, I use this code if (api-Init(../Tesseract-OCR/, tur)) {

[tesseract-ocr] Re: Why I am getting different results through GUI and programmatically?

2014-04-07 Thread Quan Nguyen
It's likely the GUI programs have added some preprocessing on the image. If you ran it directly with Tesseract executable, you would get results similar to that of Tess4J. Rescaling your image to 300DPI will produce better output. https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality On

Re: [tesseract-ocr] Re: tesseract init function problem : TESSDATA_PREFIX environment

2014-04-07 Thread Quan Nguyen
...@gmail.com javascript: : I use 3.02 2014-04-08 2:28 GMT+03:00 Quan Nguyen nguy...@gmail.com javascript:: What version of Tesseract are you testing? I believe the issue has been corrected for 3.03. https://code.google.com/p/tesseract-ocr/issues/detail?id=938 On Monday, April 7, 2014 5

[tesseract-ocr] Re: Why I am getting different results through GUI and programmatically?

2014-04-07 Thread Quan Nguyen
dancing or not. Moreover, it would be help if some can share links of Java code for image preprocessing. Thanks a lot On Monday, April 7, 2014 8:37:10 PM UTC-3, Quan Nguyen wrote: It's likely the GUI programs have added some preprocessing on the image. If you ran it directly with Tesseract

Re: Poor translations on first attempts

2014-04-04 Thread Quan Nguyen
Make sure your images are of at least 300 DPI. https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality On Wednesday, April 2, 2014 1:23:20 PM UTC-5, Joel Wheeler wrote: Hello- I've downloaded and compiled the source for Tesseract 3.02.02 and installed the English learning files. To test

Re: Tesseract fails recognizing simple and isolated digits. How can I train tesseract for recognizing digits from unknown font type

2014-03-26 Thread Quan Nguyen
I defined a ROI around each number and it seemed to produce better results. On Wednesday, March 26, 2014 1:10:56 PM UTC-5, V.Lorz wrote: Hi All, I started integrating tesseract (version 3.2, EMGV) in a project for recognizing short texts in scanned images. Using some very simple image

Re: Create boxfile from a certified text

2014-03-11 Thread Quan Nguyen
progresses before submitting to my group. Le mardi 11 mars 2014 00:08:34 UTC+1, Quan Nguyen a écrit : Bernard, What do you mean by assert a text box of 200 words? Can you elaborate? Thanks. Quan On Monday, March 10, 2014 11:06:18 AM UTC-5, Bernard Polarski wrote: Since I have

Re: Create boxfile from a certified text

2014-03-11 Thread Quan Nguyen
) seldom used. Now I feel the needs to setup regressions tests over 20 certified box/text in order to measure the impact of one single change. Working in progress and ABBY is already off but I hope more progresses before submitting to my group. Le mardi 11 mars 2014 00:08:34 UTC+1, Quan Nguyen

Re: Create boxfile from a certified text

2014-03-10 Thread Quan Nguyen
Bernard, What do you mean by assert a text box of 200 words? Can you elaborate? Thanks. Quan On Monday, March 10, 2014 11:06:18 AM UTC-5, Bernard Polarski wrote: Since I have the source, I will recompile it this evening at home and will let you know. I takes an average of 30 min to

Re: Can we use Tesseract 2.x trained files in Tesseract 3.02

2014-03-07 Thread Quan Nguyen
There's a compatible micr.traineddata attached to this post. https://groups.google.com/forum/?fromgroups#!searchin/tesseract-ocr/micr/tesseract-ocr/obWI4cz8rXg/_Yl8RuCeJAkJ -- -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this

Re: tesseract v3.03 output PDF

2014-03-02 Thread Quan Nguyen
You need to install osd.traineddata. It's available from http://code.google.com/p/tesseract-ocr/downloads/list as Orientation Script Detection Data for Tesseract 3.01. On Sunday, March 2, 2014 4:15:37 PM UTC-6, Clark Knøsen wrote: # tesseract -l dan out.png out hocr Just found out that

Re: traineddata file size varies according to box file images?

2014-03-01 Thread Quan Nguyen
) as well? Thanks, Fred On Friday, February 28, 2014 10:58:05 PM UTC+8, Quan Nguyen wrote: I'm not sure having only samples of one character in a file is a good idea. I normally train with all the characters in the same image(s). Check http://code.google.com/p/tesseract-ocr/downloads

Re: hocr binaries ?

2014-02-28 Thread Quan Nguyen
février 2014 01:15:41 UTC+1, Quan Nguyen a écrit : Beginning 3.03, Tesseract includes support for searchable PDF output. On Thursday, February 27, 2014 8:17:15 AM UTC-6, Bernard Polarski wrote: I cannot find the binaries for hocr2pdf from exact-image for windows (even for cygwin

Re: traineddata file size varies according to box file images?

2014-02-28 Thread Quan Nguyen
I'm not sure having only samples of one character in a file is a good idea. I normally train with all the characters in the same image(s). Check http://code.google.com/p/tesseract-ocr/downloads/detail?name=boxtiff-2.01.eng.tar.gz for an example. On Tuesday, February 25, 2014 10:51:39 AM

Re: hocr binaries ?

2014-02-27 Thread Quan Nguyen
Beginning 3.03, Tesseract includes support for searchable PDF output. On Thursday, February 27, 2014 8:17:15 AM UTC-6, Bernard Polarski wrote: I cannot find the binaries for hocr2pdf from exact-image for windows (even for cygwin). There are quite a few python scritps but I could not put

Re: Beg for help - DAWG2WORDLIST

2014-02-24 Thread Quan Nguyen
jTessBoxEditor bundles the training Windows executables. http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/ On Monday, February 24, 2014 12:40:04 PM UTC-6, Tigris Tigre wrote: Hi Tesseracters, Could someone please please please put *dawg2wordlist.exe* (compiled for Windows)

Re: different results between launch from command line or c++ program with tesseract/baseapi

2014-02-18 Thread Quan Nguyen
You are ocring two different images, hence the difference in result. PSM_AUTO corresponds to 3, not 1. Peforming OCR on the same image and same PSM should produce the same result. On Tuesday, February 18, 2014 8:21:17 AM UTC-6, vallet alexis wrote: Hello everyone ! I am trying to use

Re: Tesseract for Indic Scripts(Bengali)

2014-02-17 Thread Quan Nguyen
There is a Bengali language data file available from the Downloadhttp://code.google.com/p/tesseract-ocr/downloads/listpage. On Monday, February 17, 2014 5:28:10 AM UTC-6, Rabindra Rakshit wrote: I am presently working on OCR-text retrieval of Indic-Sripts. So I would like to know how far is

Re: Generated tiff from JTessBoxEditor and then makebox fails

2014-02-17 Thread Quan Nguyen
It seems to be a common occurrence. If you have many other samples of the failed characters in the image, that would compensate for the failures. I would not fret over it. On Monday, February 17, 2014 1:54:43 AM UTC-6, Ganesh BL wrote: Hi All, I have generated tif/box files from

Re: Language file for MICR font

2014-02-03 Thread Quan Nguyen
* folder and the mcr.traineddata in *tessdata* folder or only the mcr.traineddata in tessdata folder. Dont mistake me...It would be great help If you can provide your entire full setup of tesseract 3.02 project folder. Thanks On Sun, Feb 2, 2014 at 9:44 PM, Quan Nguyen nguy

Re: Language file for MICR font

2014-02-02 Thread Quan Nguyen
the downloaded traineddata in tessdata folder,,,then running this command in ocr console applicationtesseract imagename.tif output -l lang It would be great help to me..If some one gives clear idea Thanks On Sat, Feb 1, 2014 at 10:53 PM, Quan Nguyen nguy...@gmail.comjavascript: wrote: I edited

Re: Trying to recognize on android, failing miserably

2014-02-02 Thread Quan Nguyen
Attach your Simple text image. On Saturday, February 1, 2014 3:57:23 PM UTC-6, Gabriel Sechan wrote: I'm trying to use Tesseract on Android, and the results I'm getting aren't too good. Attempt 1 was to recognize the words Simple text in a very simple png (white on black). It came up with

Re: Indian Language Support

2014-02-01 Thread Quan Nguyen
You can try VietOCR http://vietocr.sf.net as a GUI for Tesseract engine. On Saturday, February 1, 2014 5:00:59 AM UTC-6, Subhabrata Banerjee wrote: Dear Ankur, Thanks for the links. I would explore them. By any way, do you know anyhow if there is any .exe file for Windows? or any GUI like

Re: Tesseract command does not work if the image path contains space in it

2014-01-14 Thread Quan Nguyen
In such cases, enclose them in quotes, e.g., c:\test image.tif, c:\folder test \test.tif, etc. On Tuesday, January 14, 2014 1:20:49 AM UTC-6, Kapil Naker wrote: Hi, I have observed that, when Image path contains space , tesseract command does not recognize it. For example below command

Re: Need help in recognizing english texts with sanskrit roman diacritical marks.

2014-01-10 Thread Quan Nguyen
aaa.DangAmbigs.txt is user-defined file used by VietOCR in post-processing (post-OCR) corrections. On Thursday, January 9, 2014 12:57:17 PM UTC-6, Ravi Roshan wrote: Please tell me where I could find this hin.DangAmbigs.txt file. Thank you. On Wednesday, 27 November 2013 21:29:43

Re: How to make tesseract not split the image into sections

2014-01-03 Thread Quan Nguyen
Try with PSM 4, 5, or 6. On Thursday, January 2, 2014 6:12:53 PM UTC-6, Benjamin Sølberg wrote: Hi all I am training tesseract to work with a custom font. Things are moving forward but there are clouds in the sky. When using tesseract it insists to cut the texts into sections. I

Re: How to make tesseract not split the image into sections

2014-01-03 Thread Quan Nguyen
as I also need to run this on an iPhone ? Regards Benjamin Den fredag den 3. januar 2014 17.48.06 UTC+1 skrev Quan Nguyen: Try with PSM 4, 5, or 6. On Thursday, January 2, 2014 6:12:53 PM UTC-6, Benjamin Sølberg wrote: Hi all I am training tesseract to work with a custom font. Things

Re: How to make tesseract not split the image into sections

2014-01-03 Thread Quan Nguyen
? Regards Benjamin Den fredag den 3. januar 2014 17.48.06 UTC+1 skrev Quan Nguyen: Try with PSM 4, 5, or 6. On Thursday, January 2, 2014 6:12:53 PM UTC-6, Benjamin Sølberg wrote: Hi all I am training tesseract to work with a custom font. Things are moving forward but there are clouds in the sky

Re: I want to know how to start to use this library.

2013-12-29 Thread Quan Nguyen
Koey, It seems that you're using 64-bit JVM. You will need compatible Tesseract and Leptonica DLLshttps://github.com/charlesw/tesseract/tree/master/src/lib/TesseractOcr/x64 . Quan On Wednesday, December 18, 2013 5:21:36 AM UTC-6, Koey wrote: I am a beginner of using library in java. When I

Re: Tess4j and Java integration issue

2013-12-27 Thread Quan Nguyen
It's not an integration issue -- you're already passed that point -- but rather an image quality issue. http://code.google.com/p/tesseract-ocr/wiki/PoorQuality On Thursday, December 26, 2013 3:59:49 PM UTC-6, Santhi Sri wrote: Hi, I am new to OCR. I tried to download Tess4j project and did

Re: Java and Tess4J - having problem with running an example

2013-12-26 Thread Quan Nguyen
You seem to have not specified the fully-qualified class name. You should cd to src and run: java -cp .;... net.sourceforge.tess4j.example.TesseractExample On Thursday, December 26, 2013 9:05:44 AM UTC-6, Risto Puusepp wrote: Hi! I am new in Java but decided to give a shot. I have to

Re: Java and Tess4J - having problem with running an example

2013-12-26 Thread Quan Nguyen
You seem to have not specified the fully-qualified class name. You should cd to src and run: java -cp .;...path-to-jars... net.sourceforge.tess4j.example.TesseractExample If possible, use an IDE, such as NetBeans or Eclipse. On Thursday, December 26, 2013 9:05:44 AM UTC-6, Risto Puusepp

Re: My application in JAVA does'nt find 'libtesseract302.lib'

2013-12-20 Thread Quan Nguyen
It's trying to search for libtesseract302.dll, not .lib. Check out these posts: http://sourceforge.net/p/tess4j/discussion/1202294/thread/cd54d983 http://stackoverflow.com/questions/10815978/including-tess4j-to-a-java-project-as-library-in-eclipse And make sure you use 32-bit Java. On

Re: I want to know how to start to use this library.

2013-12-20 Thread Quan Nguyen
What OS are you on? Make sure you use 32-bit Java. On Thursday, December 19, 2013 10:02:17 AM UTC-6, Koey wrote: i put the .dll files in the windows/system32 and .jar files in C:\Program Files\Java\jre7\lib\ext, but still not work. Is there any problem? Quan Nguyen於 2013年12月19日星期四UTC+8上午9時

Re: I want to know how to start to use this library.

2013-12-18 Thread Quan Nguyen
The DLL needs to be in the search path. http://msdn.microsoft.com/en-us/library/windows/desktop/ms682586%28v=vs.85%29.aspx Tess4J itself is a NetBeans project. Expand the Test Packages, right-click on a unit test file, and select Test File command. On Wednesday, December 18, 2013 10:30:06 AM

Re: problem while training tesseract with jTessBoxEditor in ubuntu

2013-12-09 Thread Quan Nguyen
On Monday, December 9, 2013 3:07:49 AM UTC-6, Vamsee wrote: Hi, I'm using jTessBoxEditor to train tesseract in ubuntu. Problem1. jTessBoxEditor is not able to detecting boxes from vamsi.urwpalladiolbi.exp01.tif but from vamsi.urwpalladiolbi.exp0.tif it can detect the

Re: aBOUT Training data set in tesseract-ocr 3

2013-12-05 Thread Quan Nguyen
That link is to the source. Here's the executable: http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/jTessBoxEditor-1.0.zip/download Program webpage: http://vietocr.sourceforge.net/training.html On Thursday, December 5, 2013 10:46:45 AM UTC-6, Jonatan Dellagostin wrote: 3) Im using

Re: Need help in recognizing english texts with sanskrit roman diacritical marks.

2013-11-28 Thread Quan Nguyen
. On Wednesday, November 27, 2013 9:32:57 PM UTC-6, Srivas wrote: Nope, it won't work. I use Windows 7 64 bit. The program is installed into Program files(x86) folder. Even if I set it for path en. variable, it will still give the same error. On Thu, Nov 28, 2013 at 1:25 AM, Quan Nguyen nguy

Re: Need help in recognizing english texts with sanskrit roman diacritical marks.

2013-11-27 Thread Quan Nguyen
Download and install http://sourceforge.net/projects/ghostscript/files/GPL%20Ghostscript/9.10/gs910w32.exe. Then follow the steps for setting Path environment variable as described in http://vietocr.sourceforge.net/usage.html. On Tuesday, November 26, 2013 9:50:57 PM UTC-6, Srivas wrote:

Re: traineddata is unsuccess!!!

2013-11-21 Thread Quan Nguyen
You probably have failed to rename the files with a *lang*. prefix. You can try jTessBoxEditor http://vietocr.sourceforge.net/training.htmltool to automate the training process. On Thursday, November 21, 2013 1:38:31 AM UTC-6, heinht...@itamyanmar.com wrote: Hi.. I have a problem in

Re: Tess4J on MacOsX

2013-11-18 Thread Quan Nguyen
El sábado, 16 de noviembre de 2013 00:58:03 UTC+1, Quan Nguyen escribió: Do you get or have libtesseract.dylib in your system path? On Friday, November 15, 2013 6:29:14 AM UTC-6, fontecha wrote: I have the same problem. Do you find any solution or workarround? Thanks. El miércoles

Re: Tess4J on MacOsX

2013-11-18 Thread Quan Nguyen
de noviembre de 2013 14:15:21 UTC+1, Quan Nguyen escribió: The dylib, a native shared library, should be in the system path, not classpath, as classpath is for Java class and JAR files. What is the exact name of your dylib by the way? On Monday, November 18, 2013 4:09:31 AM UTC-6, fontecha

Re: jTessBoxEditor - Tesseract box editor trainer

2013-11-16 Thread Quan Nguyen
jTessBoxEditor v1.0 Release This release includes the following improvements: - Integrate support for full automation of Tesseract training - Bundle Tesseract Windows training executables (r866), English data, and config files - Fix an issue with generated TIFF missing metadata

Re: Tess4J on MacOsX

2013-11-15 Thread Quan Nguyen
Do you get or have libtesseract.dylib in your system path? On Friday, November 15, 2013 6:29:14 AM UTC-6, fontecha wrote: I have the same problem. Do you find any solution or workarround? Thanks. El miércoles, 10 de julio de 2013 11:52:29 UTC+2, Masiar escribió: Hello, I installed

Re: jTessBoxEditor - Tesseract box editor trainer

2013-11-07 Thread Quan Nguyen
Another beta has been uploaded. This version elevates the Trainer and TIFF/Box Generator features to the main UI. Please post your comments/feedback here, if any. Thanks. http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/ On Wednesday, October 16, 2013 9:15:11 AM UTC-5, Quan Nguyen

Re: Is there a similar (simple) way to deprecated getCharacters().getBoxRects() in TesseractOCR 3.02?

2013-10-26 Thread Quan Nguyen
ResultIterator, perhaps. http://code.google.com/p/tesseract-ocr/wiki/APIExample On Friday, October 25, 2013 6:03:42 PM UTC-5, Linda Li wrote: Is there a similar (simple) way to deprecated getCharacters().getBoxRects() in TesseractOCR 3.02? I take a look at the function of

Re: Not Recognising

2013-10-24 Thread Quan Nguyen
Too low resolution. Rescaling to 300 DPI and converting to B/W produces the recognized text. On Monday, October 21, 2013 5:13:39 AM UTC-5, Aman Singh wrote: Hi, I am trying to recognise the attached image using Tesseract OCR. But it is not recognising it. Please help to figure out what

Re: Game screenshot

2013-10-20 Thread Quan Nguyen
Actually, it's jTessBoxEditor, which can be downloaded from https://sourceforge.net/projects/vietocr/files/jTessBoxEditor/ On Sunday, October 20, 2013 3:00:01 PM UTC-5, sventech wrote: VietOCR has a training program that automates the steps, but I suspect that you should preprocess the

Re: jTessBoxEditor - Tesseract box editor trainer

2013-10-16 Thread Quan Nguyen
An update that supports an option to add noise to the generated image has been uploaded. On Wednesday, September 25, 2013 9:02:13 PM UTC-5, Quan Nguyen wrote: jTessBoxEditor is a Java box editor for Tesseract OCR data. It can read images of common image formats, including multi-page TIFF

Re: jTessBoxEditor - Tesseract box editor trainer

2013-10-12 Thread Quan Nguyen
at small sized files. I ll be happy If you make the corrections. dos displaying error 3 Ekim 2013 Perşembe 23:30:57 UTC+3 tarihinde Quan Nguyen yazdı: Sorry, I still have difficulties trying to understand the issue reported by you. Your TIFF image has 30 pages and 24 million colors

Re: Sample code

2013-10-04 Thread Quan Nguyen
Thanks, Nick. I wish I had a Delete button. On Friday, October 4, 2013 9:49:35 AM UTC-5, Nick White wrote: P.S. Please don't send the same message multiple times. It can take a little while to get through, but sending multiples will just annoy everybody ;) Also, please start your own

Re: setting pagesegmode for multi-column text OCR

2013-10-03 Thread Quan Nguyen
SetPageSegMode should be called after Init. Take a look at the testOSD test case in http://sourceforge.net/p/tess4j/code/HEAD/tree/Tess4J_3/trunk/test/net/sourceforge/tess4j/ On Thursday, October 3, 2013 9:23:20 AM UTC-5, alexiuk wrote: Hi - I have an image with multiple columns I'd like to

<    1   2   3   4   5   >