Re: jTessBoxEditor - Tesseract box editor trainer

2013-10-03 Thread Quan Nguyen
Sorry, I still have difficulties trying to understand the issue reported by you. Your TIFF image has 30 pages and 24 million colors, and the file is 400 MBytes in size? And what you do mean when saying All of those are not passing through from the page to the another page? Thank you. Quan On

Re: jTessBoxEditor - Tesseract box editor trainer

2013-10-01 Thread Quan Nguyen
tarihinde Quan Nguyen yazdı: jTessBoxEditor is a Java box editor for Tesseract OCR data. It can read images of common image formats, including multi-page TIFF. The program requires JRE 6.0 or later. Version 1.0 Beta integrates support for full automation of Tesseract training. Please post your

jTessBoxEditor - Tesseract box editor trainer

2013-09-25 Thread Quan Nguyen
jTessBoxEditor is a Java box editor for Tesseract OCR data. It can read images of common image formats, including multi-page TIFF. The program requires JRE 6.0 or later. Version 1.0 Beta integrates support for full automation of Tesseract training. Please post your comments/feedback here. Thank

Re: OCR romanized Asian languages

2013-08-29 Thread Quan Nguyen
Training only involves getting the data it requires into a few appropriate files and executing a few appropriate commands, no programming required. http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 Take a look at the source training data for Vietnamese, which has many diacritical

Re: new language, normal font

2013-08-29 Thread Quan Nguyen
Training only involves getting the data it requires into a few appropriate files and executing a few appropriate commands. http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 Take a look at the source training data for Vietnamese, which has many diacritical marks similar to your

Re: OCR char restriction

2013-08-29 Thread Quan Nguyen
Try bazaar pattern matching and see if you will have better results. http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html On Thursday, August 29, 2013 3:33:28 AM UTC-5, sam vara wrote: this is my first OCR project . I am trying to feed an image that is x...@gmail.com

Re: Tesseract 3.02.2 having trouble with numeric values that contain decimal points

2013-08-22 Thread Quan Nguyen
Any example image? On Wednesday, August 21, 2013 11:44:45 AM UTC-5, Morlock wrote: Hello, I'm using Tesseract v3.02.02. I'm unable to get it to consistantly recognizing a number that contains a decimal point. Tesseract is recognizing the digits. Tesseract is recognizing the

Re: pagemode option for Tesseract AP on my c++ program.

2013-08-10 Thread Quan Nguyen
For 10, it is PSM_SINGLE_CHAR. http://code.google.com/p/tesseract-ocr/source/browse/trunk/ccstruct/publictypes.h On Wednesday, July 17, 2013 5:28:33 PM UTC-5, Gabriel Paschoal Vicente wrote: Hi Guys, I am integrating tesseract on my c++ application. When i run the command manually I got

Re: Merging multiple training data set

2013-08-10 Thread Quan Nguyen
I don't think there exists a way to merge the data files; however, in 3.02, you can rename your trained data file and specify it with the standard one to the -l option, such as: tesseract image output -l eng+eng1 On Wednesday, July 31, 2013 8:44:44 AM UTC-5, honey kansal wrote: Hi, I

Re: Reference to bangla Tif/Box file

2013-08-10 Thread Quan Nguyen
Did you try to merge them into one box, as the training Wiki suggests? http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 On Sunday, August 4, 2013 1:59:10 AM UTC-5, mama wrote: Sir, I have found the box files for bangla language for tesseract version 2 from the site *

Re: Tesseract does not recognize comma when using page segmentation = single char

2013-08-10 Thread Quan Nguyen
You could have better results with 300-DPI, binary or grayscale images. On Sunday, August 4, 2013 7:04:20 PM UTC-5, Zeulopes wrote: Hello Guys! I'm using tesseract API (version 3.02) to single character recognition in English (eng.traineddata), with the following parameters:

Re: Is there a native Windows (without .NET, Python, QT..) box file editor ?

2013-08-10 Thread Quan Nguyen
The AddOns page does not list any native Windows box editor. Most Windows systems nowadays come with .NET Framework installed; the same can be said about Java. Do you have a problem using a .NET- or Java-based box editor? On Saturday, August 3, 2013 3:58:40 AM UTC-5, n1101...@gmail.com wrote:

Re: Is there a native Windows (without .NET, Python, QT..) box file editor ?

2013-08-10 Thread Quan Nguyen
The AddOns page does not list any native Windows box editor. Most Windows systems nowadays come with .NET Framework installed; the same can be said about Java. Do you have a problem using a .NET- or Java-based box editor? On Saturday, August 3, 2013 3:58:40 AM UTC-5, n1101...@gmail.com wrote:

Re: How run tesseract V3.02 with VB.NET

2013-07-12 Thread Quan Nguyen
There's a .NET wrapper for Tesseract 3.02 at https://github.com/charlesw/tesseract. On Sunday, July 7, 2013 9:00:58 AM UTC-5, waleed Elerksosy wrote: Hello, In first i would thanks all about the effort to support us :) How to add *tesseract3 *to my VB.NET project in previous with

Re: jTessBoxEditor 0.6 Beta release

2013-05-08 Thread Quan Nguyen
file but for few other it can't be generate the box co-ordinate.Please sir I have attached the file. On Sat, May 4, 2013 at 7:38 PM, Quan Nguyen nguy...@gmail.comjavascript: wrote: What Ubuntu and Java versions are installed on your machine? You probably has a headless Java -- i.e., one

Re: Japanese detection parameter

2013-05-04 Thread Quan Nguyen
Put them in a file placed under tessdata\configs folder and specify it as a command-line option when you execute tesseract command. On Saturday, May 4, 2013 2:58:31 AM UTC-5, Sathish Kumar wrote: On Sunday, 30 December 2012 03:06:24 UTC+5:30, 服部慎 wrote: Hi . I am Japanese tesseract users.

Re: jTessBoxEditor 0.6 Beta release

2013-04-30 Thread Quan Nguyen
Yes, it runs on Ubuntu. Just unzip and execute run script. Be sure to have Java installed first. On Tuesday, April 23, 2013 12:17:21 AM UTC-5, mama wrote: Sir Is it work in UBUNTU I did't get jTessBoxEditor for UBUNTU Thank mama On Monday, October 3, 2011 9:20:00 AM UTC+5:30, Quan Nguyen

Re: jTessBoxEditor v0.8 Release

2013-04-30 Thread Quan Nguyen
Version 0.9 Release: - Enhance Generate TIFF/Box functionality to allow for combining prepending symbols in addition to appending - Fix a bug that failed to persist changes to table in edit mode - Find function now supports partial matches - Fix a problem with table not scrolling along when row

Re: jTessBoxEditor 0.6 Beta release

2013-04-30 Thread Quan Nguyen
Version 0.9 Release: - Enhance Generate TIFF/Box functionality to allow for combining prepending symbols in addition to appending - Fix a bug that failed to persist changes to table in edit mode - Find function now supports partial matches - Fix a problem with table not scrolling along when row

Re: Help With Language

2013-04-25 Thread Quan Nguyen
Are you're using v1.x version, which uses .traineddata format? What's datapath (TESSDATA_PREFIX) value? Would it work with eng? On Wednesday, April 24, 2013 2:10:15 PM UTC-5, Fabio Ebner wrote: Can someone help me?? i download tess4J, and download de portugues language, put the

Re: concatenating tr files

2013-04-22 Thread Quan Nguyen
.tr are binary files; as such, you should use: copy /b san.sanskrit2003.exp0*.tr san.sanskrit2003.exp2000.tr -- -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe

jTessBoxEditor v0.8 Release

2013-04-17 Thread Quan Nguyen
Version 0.8 has been released with the following enhancements: - Add row number header - Char cell now editable - Convert Unicode escape sequences where possible - Find box now displays Unicode characters and allows search using Unicode escape sequences - Improve Generate TIFF/Box functionality:

Re: character confidence...

2013-04-05 Thread Quan Nguyen
I'd move the SetVariable statement after the Init. On Friday, April 5, 2013 12:47:45 AM UTC-5, priya wrote: hi the code which i hav used to find word level confidence is given below, but i need character level confidence. please let me know if u hav any clues or pointers regarding

Re: character confidence...

2013-04-05 Thread Quan Nguyen
The confidence values embedded in the hOCR output are at the word, not character, level. On Friday, April 5, 2013 1:52:19 AM UTC-5, satuon wrote: I just found out that Tesseract also supports the hOCR format. But I'm not sure if character-wise confidence levels are available even there. How

Re: Can I pass a langue file direct to init? Have any way?

2013-04-05 Thread Quan Nguyen
The first parameter to Init is the path to tessdata folder; the second indicates the language. On Thursday, April 4, 2013 9:48:31 AM UTC-5, Renato Forti wrote: Hi all, My language file is in: /tesseract/ocr_default_engine/tessdata for sample: tesseract/ocr_default_engine/tessdata$ ls

Re: Extracting the coordinates of text in the image using tess4j

2013-04-04 Thread Quan Nguyen
Cross post. http://stackoverflow.com/questions/15758031/fatal-error-failed-to-write-core-dump On Tuesday, April 2, 2013 3:35:08 AM UTC-5, koushik kumar wrote: HELLO! I'm running the unit test for tessiterator from the tess4j distribution. But while running the ,code on eclipse, i got a

Re: character confidence...

2013-04-04 Thread Quan Nguyen
Show us your code. On Thursday, April 4, 2013 4:00:11 AM UTC-5, priya wrote: hi, Does any one know how to find character level confidence, i tried save_blob_choices code but it gives only word level confidence. Please let me know if you hav any pointers. -- -- You received this

Re: Terrible error with the Init() and with the sample .exe too

2013-04-04 Thread Quan Nguyen
The English language data for 2.0x is at the bottom of this page: http://code.google.com/p/tesseract-ocr/downloads/list?num=100start=100 On Wednesday, April 3, 2013 9:18:35 PM UTC-5, Damiano Rodriguez wrote: Hi all, I have a very strange problem: First of all: with visual studio 2010 and C#

Re: how to get tesseract to run on an entire folder

2013-03-30 Thread Quan Nguyen
No, you cannot run a batch of files like that with Tesseract; it has to be a Tesseract invocation for each file. Or you can use VietOCR http://vietocr.sf.net, a GUI frontend for Tesseract that supports batch or bulk OCR. On Saturday, March 30, 2013 6:30:11 AM UTC-5, rollas...@gmail.com wrote:

Re: Image to txt wrong output / C# eng.unicharset issue

2013-03-21 Thread Quan Nguyen
tessnet2 is Tesseract 2.04-based .NET wrapper while you're using Tesseract 3.0x language data. They are not compatible. On Tuesday, March 19, 2013 10:45:12 AM UTC-5, Micael Leal wrote: Hello, After installing tesseract

Re: Usage of tesseract

2013-03-21 Thread Quan Nguyen
tessnet2 and *.traineddata files are not compatible. On Tuesday, March 19, 2013 9:07:02 AM UTC-5, Micael Leal wrote: Hello, I try to implement tessnet2 but get some issues while compiling. Bitmap image = new Bitmap(@C:\Users\admin\AppData\Local\Temp\image.bmp); tessnet2.Tesseract ocr = new

Re: Extract image content

2013-03-21 Thread Quan Nguyen
Use Tesseract 2.0x-version language data. On Tuesday, March 19, 2013 7:49:40 AM UTC-5, Micael Leal wrote: Hello, I try to implement tesseract-ocr with my powerpoint program in order to recognize pictures. I can extract a picture in powerpoint, but I want to extract its content. Inside

Re: Exception in thread main java.lang.UnsatisfiedLinkError: Unable to load library 'libtesseract302': The specified module could not be found.

2013-03-13 Thread Quan Nguyen
The distributed Tess4J-1.1-src.zip includes all the files you need. Assuming you've already had Ant and JDK 6 or 7 32-bit installed, open a command prompt, navigate cd to Tess4J directory, and execute the unit tests by the following command: ant test For your Java program to work, the JAR

Re: Tessnet doOCR only returns '~'.

2013-02-20 Thread Quan Nguyen
It could mean the image does not meet the minimum requirements for OCR. Try to rescale your screenshot to 300DPI. On Monday, February 18, 2013 2:13:04 PM UTC-6, Tommy Walsh wrote: I haven't been able to find anything on this. I'm using Tessnet2 to take a small screenshot and try to read the

Re: Not abe to generate accurate output for Dartangnon-ITC and rageItalics fonts

2013-01-21 Thread Quan Nguyen
be very helpful if I can get any suggestions here. Thanks in advance. On Friday, January 18, 2013 9:32:35 AM UTC+5:30, Quan Nguyen wrote: Boxes look overlapping. You may want to space them out a bit more. On Thursday, January 17, 2013 10:33:13 AM UTC-6, Tauqeer baig wrote: I am trying

Re: how to recognize this?

2013-01-21 Thread Quan Nguyen
You would have better success with 1) rescaling the image to 300 DPI, 2) send the coordinates of each letter, and 3) use PSM 10. On Monday, January 21, 2013 8:41:04 AM UTC-6, Luigi De Rosa wrote: Hi to all, i'm trying to recognize those big characters in this attached picture. I tried in

Re: Unable to load library 'libtesseract302': The specified module could not be found. error

2013-01-18 Thread Quan Nguyen
JVM 64-bit cannot load Tesseract and Leptonica 32-bit DLLs. You would need JVM 32-bit. On Friday, January 18, 2013 8:11:56 AM UTC-6, Deniz Atak wrote: Hi, I am trying to run Tess4J in 64 JVM from Netbeans IDE and getting this error: Testcase:

Re: shapeclustering comman have error!!

2013-01-16 Thread Quan Nguyen
Georgia_Bold Georgia_Italic Times_New_Roman Times_New_Roman_Bold Trebuchet_MS Trebuchet_MS_Bold URW_Bookman_L_Italic Verdana Verdana_Bold [1] http://pastebin.com/0dV84hBa Zdenko On Wed, Jan 16, 2013 at 1:02 AM, Quan Nguyen nguy...@gmail.comjavascript: wrote: I can shorten Times New

Re: shapeclustering comman have error!!

2013-01-14 Thread Quan Nguyen
[fontname] is just a token. If it has spaces, simply remove the spaces. On Sunday, January 13, 2013 9:01:34 PM UTC-6, gold snake wrote: thanks, the problem is fixed now,because the font_properties and the [ lang].[fontname].exp[num] on the command , must same. but one thing i cant

Re: shapeclustering comman have error!!

2013-01-13 Thread Quan Nguyen
Your filename does not seem to follow the naming convention [lang].[fontname].exp[num].tif (see TrainingTesseract3). And since your fontname is A, the content of font_properties should be: A 0 0 0 0 0 On Saturday, January 12, 2013 2:15:09 AM UTC-6, gold snake wrote: *the display error

Re: Why I can't use new eng.traineddata by VS2012(.NET)??!!

2013-01-13 Thread Quan Nguyen
The new .NET wrapper for Tesseract 3.02, which is still under development, can be found at https://github.com/charlesw/tesseract. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com

Re: Need some tuning / config advice

2012-12-13 Thread Quan Nguyen
, December 11, 2012 10:12:05 PM UTC-5, Quan Nguyen wrote: Rescaling to 300 DPI will produce much better results for the images. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com

Re: Tesseract OCR 3.02 .NET (TessNET2) library crashes in sample programs in ocr.Init(null, eng, false);

2012-12-04 Thread Quan Nguyen
Also, VietOCR.NET 3.3x uses the .NET wrapper for Tesseract 3.0.1. On Tuesday, December 4, 2012 10:18:55 AM UTC-6, eljainc wrote: Quan, Thank you very much for this information. I will give it a try. Mike McWhinney elja, Inc. -- *From:* Quan Nguyen nguy

Re: Tesseract OCR 3.02 .NET (TessNET2) library crashes in sample programs in ocr.Init(null, eng, false);

2012-12-02 Thread Quan Nguyen
Check out the source of VietOCR.NET 2.0.4, which uses the same tessnet2 library. http://sourceforge.net/projects/vietocr/files/vietocr.net/2.0.4/ -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to

Re: Tesseract OCR 3.02 .NET (TessNET2) library crashes in sample programs in ocr.Init(null, eng, false);

2012-12-02 Thread Quan Nguyen
Check out the source of VietOCR.NET 2.0.5, which uses the same tessnet2 library. http://sourceforge.net/projects/vietocr/files/vietocr.net/2.0.5/http://sourceforge.net/projects/vietocr/files/vietocr.net/2.0.4/ -- You received this message because you are subscribed to the Google Groups

Re: confidence level of tessract ocr output

2012-11-12 Thread Quan Nguyen
If you build using the latest source (r806http://code.google.com/p/tesseract-ocr/source/detail?r=806), you'll get the word confidence in the hOCR output. On Monday, November 12, 2012 1:54:34 AM UTC-6, lirong wrote: Hi, everyone,Dose tessrect- ocr can output confidence level of the result? I

Re: mftraining produces Missing font_properties

2012-11-12 Thread Quan Nguyen
The Powershell script train.ps1 on AddOns page can help automate the training process. http://code.google.com/p/tesseract-ocr/wiki/AddOns On Tuesday, May 17, 2011 2:08:53 AM UTC-5, Eyal wrote: Hi, I tried to train some letters when I ran the *mftraining *with the parameters*:*

Re: Tesseract 3.02.02 and Windows 8

2012-11-06 Thread Quan Nguyen
Both Tesseract .exe and .dll execute without any problem on my Windows 8 Release Preview. I tried them via VietOCR program. On Tuesday, November 6, 2012 8:08:06 AM UTC-8, zdenop wrote: Hello, did somebody tried to use Tesseract 3.02.02 on Windows 8? Can you share your experience (does it

Re: Improving the 'AddOns' wiki page

2012-10-28 Thread Quan Nguyen
A couple of places (readme and faq pages) still refer to the GUI section in the AddOns. That section has been now moved to 3rdParty. On Sunday, October 28, 2012 12:10:56 PM UTC-5, zdenop wrote: Changed. -- Zdenko On Tue, Oct 23, 2012 at 3:11 PM, Nick White

Re: Automate Tesseract 3.02 language data generation process

2012-10-06 Thread Quan Nguyen
The train.ps1 script has been updated for Tesseract 3.02 training. http://vietocr.svn.sourceforge.net/viewvc/vietocr/jTessBoxEditor/trunk/tools/ On Sunday, March 27, 2011 12:21:11 PM UTC-5, Quan Nguyen wrote: I created a PowerShell script to automate language data generation for Tesseract

Re: Automate Tesseract 3.02 language data generation process

2012-10-06 Thread Quan Nguyen
The script has been updated for Tesseract 3.02 training. http://vietocr.svn.sourceforge.net/viewvc/vietocr/jTessBoxEditor/trunk/tools/ On Sunday, March 27, 2011 12:21:11 PM UTC-5, Quan Nguyen wrote: I created a PowerShell script to automate language data generation for Tesseract 3.01. Save

Re: Java GUI frontend for Tesseract OCR engine

2012-10-05 Thread Quan Nguyen
VietOCR 3.4 RC has been released. This incorporates the latest Tesseract 3.02 executable and library. Please help test. Any input or comment is welcome. http://sourceforge.net/projects/vietocr/files/vietocr/ -- You received this message because you are subscribed to the Google Groups

Re: Effect of font_properties

2012-10-05 Thread Quan Nguyen
Instead of concatenating the .tr files, you can merge all your images, if they all have the same font style, into a multi-page TIFF and train with that. You can use jTessBoxEditorhttp://vietocr.sourceforge.net/training.htmlto merge images and edit the box file. On Monday, October 1, 2012

Re: Persian Tesseract?

2012-08-09 Thread Quan Nguyen
When Tesseract 3.02 is officially released, the author of tessdotnet will update to it. Then we'll have multiple language support. https://github.com/charlesw/tesseract-ocr-dotnet/issues/4 On Thursday, August 9, 2012 9:10:41 AM UTC-5, Alex C wrote: Hi. Is there a Tesseract language pack for

Re: c# tesseract2 recognizes 2 instead of 3

2012-08-04 Thread Quan Nguyen
Scaling up your images to 300 DPI will improve the results. Or upgrade to Tesseract 3.01 .NET wrapper (https://github.com/charlesw/tesseract-ocr-dotnet). On Saturday, August 4, 2012 7:15:40 AM UTC-5, hugi wrote:

Re: Adding the tesseract libraries to a java project in eclipse..

2012-07-17 Thread Quan Nguyen
http://stackoverflow.com/questions/10815978/including-tess4j-to-a-java-project-as-library-in-eclipse -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this

Re: How to find a word and extract the coordinates of the same from hocr(String) using java.

2012-05-04 Thread Quan Nguyen
You can use an HTML parser to get the data you want. On Thursday, May 3, 2012 2:01:56 AM UTC-5, harry asir wrote: Hi all, Can any body suggest how to find a word and extract coordinates of the same from hocr (String) using java. I am using Tess4j 1.0 Beta 2 and i got hocr output as a

Re: Tess4J - a Java wrapper for Tesseract OCR DLL

2012-04-29 Thread Quan Nguyen
the Java Virtual Machine in native code. # See problematic frame for where to report the bug. Could you please let me know, where i am going wrong. Thanks! Regards, Kamal. On Sunday, August 22, 2010 10:35:26 PM UTC-4, Quan Nguyen wrote: A JNA-based wrapper for Tesseract OCR DLL, the library

Re: Tess4J 1.0 Beta Release

2012-04-25 Thread Quan Nguyen
23, 2012 at 7:07 PM, Quan Nguyen wrote: all of the provided image processing functions are geared for Pix type, not raw image. Why not just create a Pix from the raw image data? Leptonica has pixCreateHeader(), pixSetResolution(), pixSetWpl(), pixSetData(), etc [1] and various helper

Re: Tess4J 1.0 Beta Release

2012-04-24 Thread Quan Nguyen
. Regards, Harry John Asir On Apr 24, 7:07 am, Quan Nguyen nguyen...@gmail.com wrote: Execution for .exe and .dll+Java follow different paths: one calling ProcessPage with Leptonica Pix image and one calling TesseractRect or GetUTF8Text with raw image. It seems that Pix image get

Re: Tess4J 1.0 Beta Release

2012-04-19 Thread Quan Nguyen
: # http://java.sun.com/webapps/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native coe. # See problematic frame for where to report the bug. # Please help me how to solve this issue. Regards, Harry John Asir On Apr 19, 9:18 am, Quan Nguyen nguyen

Re: Tess4J 1.0 Beta Release

2012-04-18 Thread Quan Nguyen
for coloured images. With the test images present in Tess4J folder (Ziped one), Ocr is working. Can you help me in doing ocr for coloured images using Tess4J. I am using Windows 7 PC. Regards, Harry John Asir On Apr 17, 8:14 am, Quan Nguyen nguyen...@gmail.com wrote: A JNA-based

Tess4J 1.0 Beta Release

2012-04-16 Thread Quan Nguyen
A JNA-based wrapper for Tesseract OCR 3.02 DLL, the library provides optical character recognition (OCR) support for: * TIFF, JPEG, GIF, PNG, and BMP image formats * Multi-page TIFF images * PDF document format This version is still in early beta development; as such, it has rough

Re: Tessnet2: How to start

2011-11-13 Thread Quan Nguyen
Tessnet2 is .NET 2.0. Did you target your VS2010 solution for .NET 2.0? VietOCR.NET 2.x, which uses the same wrapper, is VS2008-based and works fine on Win7. http://sourceforge.net/projects/vietocr/files/vietocr.net/ On Nov 12, 1:17 pm, Carlesmk carles.blasc...@gmail.com wrote: Hi everibody,

Re: tessedit_char_whitelist alphabet error

2011-10-29 Thread Quan Nguyen
Are you sure it does not accept Unicode characters? If that's the case, you can convert Unicode characters to ASCII escaped sequences. In JDK, there is a tool named native2ascii, which takes a text file with specified encoding and produces an output file containing escaped sequences. Anyhow, I

Re: Pleasae help me

2011-10-29 Thread Quan Nguyen
There's a jTessBoxEditor tool that can help in editing the boxes. It can also generate training images (and boxes). http://vietocr.sourceforge.net/training.html On Oct 29, 4:02 am, merve t mervet2...@gmail.com wrote: http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 i did

Re: Is there minimum of letters?

2011-10-24 Thread Quan Nguyen
Try with PSM 8 or 10. On Oct 24, 9:09 am, Giuseppe Menga me...@polito.it wrote: That is interesting. I'm recognizing espiration dates from medicines, and I found convenient to repeat the date 3 or 4 times, it improves recognition. Someone can explain the reason. Giuseppe -Messaggio

Re: jTessBoxEditor 0.6 Beta release

2011-10-18 Thread Quan Nguyen
-Merged box will have a character value composed of all the characters of the merging boxes http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/ On Oct 2, 10:50 pm, Quan Nguyen nguyen...@gmail.com wrote: A box editor for Tesseract OCR data. This release includes the following fixes

Re: How to OCR single page within multipage TIFF

2011-10-14 Thread Quan Nguyen
Tesseract does not support that feature out of the box -- it would recognize all pages found in multi-page TIFF. You'll have to manually extract a specific page and send it to Tesseract for recognition. Have you tried a frontend, such as VietOCR? It supports reading multi- page TIFF and lets the

Re: tessnet2 error for tesseract 3.00

2011-10-13 Thread Quan Nguyen
They're not compatible. If you want Tess 3.0x, try http://code.google.com/p/tesseractdotnet/ . On Oct 13, 3:23 am, onur karali onurkar...@gmail.com wrote: Hi, I can build and use .net wrapper tessnet2 for tesseract version 2.04 successfully but build operation gives error about baseAPI.h could

Re: jTessBoxEditor

2011-10-09 Thread Quan Nguyen
. display only rectangles.  Please tell me the specification for the txt file to be accepted by jRessBoxEditor, Thanks in advance MNS Rao - Original Message - From: Quan Nguyen nguyen...@gmail.com To: tesseract-ocr tesseract-ocr@googlegroups.com Sent: Friday, October 07, 2011 2:08 AM

Re: read other languages ​​by tesseract on c #

2011-10-07 Thread Quan Nguyen
with tesseract.exe On 5 Ott, 04:27, Quan Nguyen nguyen...@gmail.com wrote: What's the error exactly? Does the image work with tesseract.exe? On Oct 4, 5:02 am, Alessandro Latella alexla...@libero.it wrote: Hi guys, I'm trying to run tesseract on c #. The program works well

Re: jTessBoxEditor

2011-10-06 Thread Quan Nguyen
I tried on the text received, using Windows fonts Tunga on Win7 64- bit, w/o any problem. I can't attach the output files here, so please check your inbox. On Oct 6, 6:52 am, mns_rao mns...@gmail.com wrote: Generating Tiff/box for kannada Text file is not working; For Tunga font only rectangles

Re: read other languages ​​by tesseract on c #

2011-10-04 Thread Quan Nguyen
What's the error exactly? Does the image work with tesseract.exe? On Oct 4, 5:02 am, Alessandro Latella alexla...@libero.it wrote: Hi guys, I'm trying to run tesseract on c #. The program works well on English language  'ocr.Init(@C:\Program Files\Tesseract-OCR\tessdata, eng, false);' If I

Re: How do I convert multipage TIFF file to text with Tess 3.0

2011-10-04 Thread Quan Nguyen
Out of the box, Tess 3.0 supports multi-page TIFF. Did you try? On Oct 4, 12:49 pm, LAPIII webpren...@gmail.com wrote: Also, I'm using Linux Mint. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to

jTessBoxEditor 0.6 Beta release

2011-10-02 Thread Quan Nguyen
A box editor for Tesseract OCR data. This release includes the following fixes and enhancements: - Add a utility function which creates TIFF/Box pair suitable for training with Tesseract - Fix a bug which may clear out a modified box file when loading another image Please help test and post your

Re: Tesnet2 bug:TesseractOCR.vshost.exe: Program Trace' has exited with code 0 (0x0).

2011-09-20 Thread Quan Nguyen
Does the tessdata folder have the required language data files? On Sep 20, 3:51 am, Daniela21 dmari...@gmail.com wrote: Hello, I am trying to run the tesnet2 project based on these

Re: Problem processing specific TIF from ImageMagick

2011-09-13 Thread Quan Nguyen
VietOCR (Java version) does not feed the original image to Tesseract, but rather it reads and then writes back out an uncompressed TIFF file, rescaled to 300 DPI if instructed so, which is then sent to the engine. I found this regurgitated image somehow has been more amenable to Tesseract. The

Re: Problem processing specific TIF from ImageMagick

2011-09-11 Thread Quan Nguyen
Hi Jon, I tried your images with VietOCR, which makes the images more amenable to Tesseract engine, and it produced fairly accurate results. I think it could have been better if -density 300 had been used. You can open PDF directly in VietOCR if GhostScript has been installed.

Re: Multiple columns text.

2011-09-09 Thread Quan Nguyen
Please try the latest beta versions, which incorporate the PSM fix. On Sep 9, 1:42 am, Bonny esla...@gmail.com wrote: Huh.. No attachment alowed. In meantime I try VietOCR but doesn't recongnize two colon too. -- You received this message because you are subscribed to the Google Groups

Re: Is it hard to add a new font to existing .traineddata?

2011-09-08 Thread Quan Nguyen
There's a Windows powershell script in AddOns. http://code.google.com/p/tesseract-ocr/wiki/AddOns On Sep 7, 11:45 pm, haoest hao...@gmail.com wrote: But without a batch file to build the .tr files, re-building all 32 fonts from command line would be terrifying. -- You received this message

Re: tesseract ocr multipage pdf hangs

2011-07-12 Thread Quan Nguyen
Vish, Tess4J does support multi-page PDF and multi-page TIFF. Substitute with your PDF file in the unit test case and give it a try. Regards, Quan On Jul 12, 1:20 am, Vish yava...@gmail.com wrote: Gurus, We are using Tesseract's Java library, Called Tess4j to convert PDF files to text. It

Re: New version of tesseractdotnetwrapper

2011-07-09 Thread Quan Nguyen
tesseract.dll is x86, so make sure your project's Property Build Platform target is also x86. On Jul 9, 7:00 am, Sarel van der Merwe sfvdme...@gmail.com wrote: I installed the redistribution pack. 1. Reboot and recompiled. 2. Still having the same problem. Could not load file or assembly

Re: New version of tesseractdotnetwrapper

2011-07-06 Thread Quan Nguyen
Andreas, Try adding a slash to the data path, such as: string tessdataFolder = @D:\Temp\IPoVnOCRer\IPoVn\Test\Tessdata\; I'm curious as to why you use unsafe block in your code. Quan On Jul 6, 5:01 am, Andreas Reiff andire...@googlemail.com wrote: I get an AccessViolationException, trying to

Re: New version of tesseractdotnetwrapper

2011-07-06 Thread Quan Nguyen
into this yet. Best wishes, Andreas On 6 Jul., 14:05, Quan Nguyen nguyen...@gmail.com wrote: Andreas, Try adding a slash to the data path, such as: string tessdataFolder = @D:\Temp\IPoVnOCRer\IPoVn\Test\Tessdata\; I'm curious as to why you use unsafe block in your code. Quan

Re: VietOCR 3.1 Beta release

2011-07-05 Thread Quan Nguyen
Hunter, I would grab the files from the project's svn. You can then build the tesseract.dll from that. I put in a couple of minor changes so that the recognize method would accept an additional rectangular region paramter and return Unicode string rather than UTF-8. Look at the project's Issues

Re: VietOCR.NET 3.1 Beta release

2011-07-05 Thread Quan Nguyen
Correct Subject line. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options,

VietOCR 3.1 Beta release

2011-07-03 Thread Quan Nguyen
Version 3.1 Beta integrates the new tesseractdotnet .NET wrapper DLL x86 (r42+). In contrast, version 3.0 uses command-line process to invoke Tesseract.exe. http://vietocr.sf.net For more info about the wrapper, visit http://code.google.com/p/tesseractdotnet/ -- You received this message

Re: VietOCR 3.1 Beta release

2011-07-03 Thread Quan Nguyen
A correction: the name should be VietOCR.NET 3.1 Beta. On Jul 3, 10:08 am, Quan Nguyen nguyen...@gmail.com wrote: Version 3.1 Beta integrates the new tesseractdotnet .NET wrapper DLL x86 (r42+). In contrast, version 3.0 uses command-line process to invoke Tesseract.exe. http://vietocr.sf.net

Re: Tesseract doesn't work with a very simple example

2011-06-18 Thread Quan Nguyen
The resolution of your image is too low -- at 96 DPI, any OCR engine would have problem with it. After rescaling to 300 DPI, Tesseract was able to recognize it. On Jun 17, 9:05 am, Felipe Coutinho felipelcouti...@gmail.com wrote: Hello, I'm a new tess user. I'm trying to test the tess with

VietOCR v2.0.3/3.1.3 VietOCR.NET v2.0.3 Releases

2011-06-04 Thread Quan Nguyen
A Java/.NET GUI frontend for Tesseract OCR engine. The releases include the following fixes and improvements: * Improve program usability, enabling image nagivation and manipulation with keyboard * Fix an installation issue that was unable to uninstall previous versions (.NET only) * Fix an EOL

Re: Automate Tesseract 3.01 language data generation process

2011-05-25 Thread Quan Nguyen
That's the problem -- you'd need an entry for every image file. The following is excerpted from the TrainingTesseract3 wiki: When running mftraining, each .tr filename must match an entry in the font_properties file, or mftraining will abort. If they are the same font, you can put them in a

Re: Is there a way to specify language data (*including hard path*) from the command line?

2011-05-21 Thread Quan Nguyen
Have you tried setting the environment variable TESSDATA_PREFIX? On May 20, 1:47 pm, Daniel cogdeb...@gmail.com wrote: I'm attempting to integrate Tesseract 3 with another stand-alone app, but I'm running into a problem: Tesseract always looks for the language files in \Program Files

Re: Tesseract 3.01 Training and Error opening unicharset file

2011-05-21 Thread Quan Nguyen
Can you elaborate on the problems with those characters? On May 20, 9:44 am, Holm Dressler velovity1...@googlemail.com wrote: 2. I clean up the box file with jTessBoxEditor.jar (still have problems with special characters like the German ö,ä,ü ...) -- You received this message because you are

Re: Setting up Tessnet for .Net application

2011-05-09 Thread Quan Nguyen
Take a look at the source code of VietOCR.NET, which uses tessnet2 library. http://vietocr.sf.net On May 9, 10:08 am, Vignesh Raj vignesh...@greatminds.co.in wrote: Hi. Am very new to this and I need some help on how to set up tessnet for my .Net (c#) based application. I have not done

Re: Difficulties to use Tesseract

2011-05-09 Thread Quan Nguyen
Did you scan them correctly, with appropriate pixel resolution (~300 DPI) and monochrome/grayscale settings? On May 9, 10:20 am, Giby_the_kid g.benjamin.le...@gmail.com wrote: I've test with the sample of text in the sources... it has worked... Now if I tried with any other scanned document, I

Re: i need hel installing the jap lang file for tesseract

2011-05-05 Thread Quan Nguyen
The binary executable would be placed in /usr/bin and language data in /usr/share/tesseract-ocr/tessdata. On May 5, 8:54 pm, James McCartha slayer2...@gmail.com wrote: i used the synaptic manager and im using the newest ver of ubuntu whare would the subdirectory be located in ubuntu -- You

Re: creating train data set for Korean

2011-04-29 Thread Quan Nguyen
Looks like you're running Tesseract 2.0x version, which does not support Oriental scripts. Download, install Tesseract 3.01 and try training again. On Apr 29, 7:09 am, Oleg Tikhonov olegtikho...@gmail.com wrote: Here is a command and the error/message $ tesseract.exe

Re: creating train data set for Korean

2011-04-28 Thread Quan Nguyen
Print screens are, in general, not adequate for training new languages. You'd be better off using GIMP to produce your TIFF images. Be sure to specify the language to bootstrap the new charset, such as: $ tesseract.exe ../korean_training/kor.ariel.exp1.tif ../ korean_training/kor.ariel.exp1 -l

Re: Several input files into one output file

2011-04-28 Thread Quan Nguyen
You can try VietOCR, a frontend program which uses Tesseract engine to perform OCR on multi-page TIFF or individual ones and appends the output to previous results. On Apr 28, 8:41 pm, faye stefan.der.pr...@googlemail.com wrote: Is there an option to let tessarct write the output of several

<    1   2   3   4   5   >