Re: [tesseract-ocr] VietOCR v6.3.0 & VietOCR.NET v6.3.0 Releases

2024-03-13 Thread Quan Nguyen
VietOCR v6.13.0 & VietOCR.NET v6.11.0 Releases

A Java/.NET WPF GUI frontend for Tesseract OCR engine. The releases include 
the following improvements:

- Upgrade to Tesseract 5.3.4
- Implement open add image functionality using Shift key
- Adjust size and position of dialogs and components to accommodate long 
localized text
- Update translations

http://vietocr.sf.net

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/71563d84-b82f-48dd-95de-178cd0797234n%40googlegroups.com.


[tesseract-ocr] Re: Tess4j failing near load of shared library tesseract-ocr-5.2 in Java 11 and 17, succeeds in Java 8

2022-09-28 Thread Quan Nguyen
PDF files are read by PDFBox library. You may want to look into that area 
as well.

On Wednesday, September 28, 2022 at 10:52:15 PM UTC-5 Quan Nguyen wrote:

> The source of tess4j is available; you can trace through the code to see 
> what threw the exception.
>
> Nevertheless, "throwable while reading PDF" seems to point to the part of 
> code that reads in PDF file. Was that something you wrote, or from tess4j 
> itself?
>
> On Sunday, September 25, 2022 at 11:02:35 AM UTC-5 rcja...@gmail.com 
> wrote:
>
>> I'm using Tess4j in a Java program to access Tesseract and read  PDFs 
>> read with PDFBox. I've been using Java 8, and things are running. The 
>> program is not commercial; I provide it to non-profits doing pro bono legal 
>> work in my state. In java 8 using the command line and eclipse, the program 
>> runs fine; running from the command line in either Java 11 or Java 17 
>> causes an error at the point where the program calls Tesseract.doOCR().
>>
>> I've dumped class loading information and see that last class loaded 
>> before the fatal exception is com.sun.jna.Platform; it would be used, for 
>> instance, to determine the platform on which the program is running. I 
>> haven't been able to find the source for the 5.2 version I downloaded from 
>> UB Mannheim, that would be useful since the stack trace has line numbers.
>>
>> The following is a snippet showing log messages, System.out.println 
>> messages, stacktraces, and class loading messages near the point of failure:
>>
>> pdfRenderer created buffered Image
>> set a couple of tesseract vars
>> [14.960s][info][class,load] net.sourceforge.tess4j.util.ImageIOHelper 
>> source: rsrc:tess4j-5.4.0.jar
>> [14.961s][info][class,load] javax.imageio.IIOParam source: 
>> jrt:/java.desktop
>> [14.961s][info][class,load] javax.imageio.ImageWriteParam source: 
>> jrt:/java.desktop
>> [14.962s][info][class,load] 
>> com.github.jaiimageio.plugins.tiff.TIFFImageWriteParam source: 
>> rsrc:jai-imageio-core-1.4.0.jar
>> [14.963s][info][class,load] javax.imageio.IIOImage source: 
>> jrt:/java.desktop
>> [14.964s][info][class,load] com.sun.jna.Library source: 
>> rsrc:jna-5.12.1.jar
>> [14.965s][info][class,load] net.sourceforge.tess4j.ITessAPI source: 
>> rsrc:tess4j-5.4.0.jar
>> [14.965s][info][class,load] net.sourceforge.tess4j.TessAPI source: 
>> rsrc:tess4j-5.4.0.jar
>> [14.966s][info][class,load] net.sourceforge.tess4j.util.LoadLibs source: 
>> rsrc:tess4j-5.4.0.jar
>> [14.969s][info][class,load] com.sun.jna.Platform source: 
>> rsrc:jna-5.12.1.jar
>> [14.973s][info][class,load] java.lang.ExceptionInInitializerError source: 
>> jrt:/java.base
>> throwable while reading PDF
>> [14.973s][info][class,load] java.lang.Throwable$PrintStreamOrWriter 
>> source: jrt:/java.base
>> [14.974s][info][class,load] java.lang.Throwable$WrappedPrintStream 
>> source: jrt:/java.base
>> java.lang.ExceptionInInitializerError
>> at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:442)
>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:326)
>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:309)
>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:290)
>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:274)
>> at 
>> drivingrecordtool.file.DrivingRecordPDFTextReader.getOCRText(DrivingRecordPDFTextReader.java:152)
>> at 
>> drivingrecordtool.file.DrivingRecordPDFTextReader.getText(DrivingRecordPDFTextReader.java:46)
>> at 
>> drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:78)
>> at 
>> drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:1)
>> at 
>> java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304)
>> at 
>> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>> at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343)
>> at 
>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>> at 
>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>> at java.base/java.lang.Thread.run(Thread.java:834)
>> Caused by: java.lang.IllegalStateException: zip file closed
>> at java.base/java.util.zip.ZipFile.ensureOpen(ZipFile.java:913)
>> at java.base/java.util.zip.ZipFile.getEntry(ZipFile.java:348)
>>
>> If I uninstall Java and in

[tesseract-ocr] Re: Tess4j failing near load of shared library tesseract-ocr-5.2 in Java 11 and 17, succeeds in Java 8

2022-09-28 Thread Quan Nguyen
The source of tess4j is available; you can trace through the code to see 
what threw the exception.

Nevertheless, "throwable while reading PDF" seems to point to the part of 
code that reads in PDF file. Was that something you wrote, or from tess4j 
itself?

On Sunday, September 25, 2022 at 11:02:35 AM UTC-5 rcja...@gmail.com wrote:

> I'm using Tess4j in a Java program to access Tesseract and read  PDFs read 
> with PDFBox. I've been using Java 8, and things are running. The program is 
> not commercial; I provide it to non-profits doing pro bono legal work in my 
> state. In java 8 using the command line and eclipse, the program runs fine; 
> running from the command line in either Java 11 or Java 17 causes an error 
> at the point where the program calls Tesseract.doOCR().
>
> I've dumped class loading information and see that last class loaded 
> before the fatal exception is com.sun.jna.Platform; it would be used, for 
> instance, to determine the platform on which the program is running. I 
> haven't been able to find the source for the 5.2 version I downloaded from 
> UB Mannheim, that would be useful since the stack trace has line numbers.
>
> The following is a snippet showing log messages, System.out.println 
> messages, stacktraces, and class loading messages near the point of failure:
>
> pdfRenderer created buffered Image
> set a couple of tesseract vars
> [14.960s][info][class,load] net.sourceforge.tess4j.util.ImageIOHelper 
> source: rsrc:tess4j-5.4.0.jar
> [14.961s][info][class,load] javax.imageio.IIOParam source: 
> jrt:/java.desktop
> [14.961s][info][class,load] javax.imageio.ImageWriteParam source: 
> jrt:/java.desktop
> [14.962s][info][class,load] 
> com.github.jaiimageio.plugins.tiff.TIFFImageWriteParam source: 
> rsrc:jai-imageio-core-1.4.0.jar
> [14.963s][info][class,load] javax.imageio.IIOImage source: 
> jrt:/java.desktop
> [14.964s][info][class,load] com.sun.jna.Library source: rsrc:jna-5.12.1.jar
> [14.965s][info][class,load] net.sourceforge.tess4j.ITessAPI source: 
> rsrc:tess4j-5.4.0.jar
> [14.965s][info][class,load] net.sourceforge.tess4j.TessAPI source: 
> rsrc:tess4j-5.4.0.jar
> [14.966s][info][class,load] net.sourceforge.tess4j.util.LoadLibs source: 
> rsrc:tess4j-5.4.0.jar
> [14.969s][info][class,load] com.sun.jna.Platform source: 
> rsrc:jna-5.12.1.jar
> [14.973s][info][class,load] java.lang.ExceptionInInitializerError source: 
> jrt:/java.base
> throwable while reading PDF
> [14.973s][info][class,load] java.lang.Throwable$PrintStreamOrWriter 
> source: jrt:/java.base
> [14.974s][info][class,load] java.lang.Throwable$WrappedPrintStream source: 
> jrt:/java.base
> java.lang.ExceptionInInitializerError
> at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:442)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:326)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:309)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:290)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:274)
> at 
> drivingrecordtool.file.DrivingRecordPDFTextReader.getOCRText(DrivingRecordPDFTextReader.java:152)
> at 
> drivingrecordtool.file.DrivingRecordPDFTextReader.getText(DrivingRecordPDFTextReader.java:46)
> at 
> drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:78)
> at 
> drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:1)
> at 
> java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304)
> at 
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.IllegalStateException: zip file closed
> at java.base/java.util.zip.ZipFile.ensureOpen(ZipFile.java:913)
> at java.base/java.util.zip.ZipFile.getEntry(ZipFile.java:348)
>
> If I uninstall Java and install Java 8, the program works fine.
>
> If I uninstall Java and install Java 11 or Java 17, it fails in this 
> fashion.
>
> Can anyone help me understand what the difference might be between the 
> versions of Java so I can fix this?
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f7309e7d-79bf-4594-a581-a1ce6a556b62n%40googlegroups.com.


Re: [tesseract-ocr] Re: ErrorInInitializerError - zip file closed out of tess4j.util.LoadLibs.getTesseractLibName

2022-09-28 Thread Quan Nguyen
You try the suggested program to see if the same exception occurs. If it 
does not, then it's possible that something in your code is not right. 
VietOCR is open source; you can browse through the code to see how it works.

Another suggestion: try to use the alternate Tesseract1 API.

On Friday, September 23, 2022 at 6:50:19 AM UTC-5 rcja...@gmail.com wrote:

> How is this supposed to help me? I have a program using the Tesseract 
> library to do OCR and then process the resulting text; I don't need a GUI 
> front end.
>
> On Thu, Sep 22, 2022 at 10:09 PM Quan Nguyen  wrote:
>
>> You may want to try VietOCR, a Java desktop app that uses Tess4J. It 
>> works with Java 8, 18, and probably 11 too.
>>
>> http://vietocr.sf.net
>>
>> On Thursday, September 22, 2022 at 7:10:54 AM UTC-5 rcja...@gmail.com 
>> wrote:
>>
>>> I am running a Java desktop application on Windows 10 Pro using Tess4j; 
>>> it was working fine with Java 1.8, am now trying to get it to run with Java 
>>> 11 (11.0.15.1). When it calls 
>>> net.sourceforge.tess4j.Tesseract.doOCR(BufferedImage 
>>> bi), it gets a Throwable ExceptionInInitializerError, with a 'cause' 
>>> exception indicating a call to 
>>> net.sourceforge.tess4j.util.LoadLibs.getTesseractLibName().
>>>
>>> Oddly enough, the application works fine from within eclipse, but fails 
>>> running on its own (for instance, running "java -jar program.jar" from a 
>>> cmd window). As I said, it was running with Java 1.8, and I might be able 
>>> to get it running again by uninstalling Java and reinstalling that JRE, but 
>>> there are other people that are going to use the application and it is not 
>>> reasonable to expect them to use a Java version that is that old. 
>>>
>>> I had the Maven dependency:
>>>
>>> net.sourceforge.tess4j
>>> tess4j
>>> 4.4.1
>>>
>>> The only copy of tess4j-4.1.1.jar on my hard drive is in my 
>>> .m2\repository\net\sourceforge\tess4j\tess4j\4.4.1. I have twice deleted it 
>>> and updated my maven configuration to restore it, to insure it is not 
>>> corrupted.
>>>
>>> I noticed that this is not the most recent Tess4j, so updated the 
>>> dependency to 5.4.0 and rebuilt. This has the same results: works in 
>>> eclipse, not from command line.
>>>
>>> The ExceptionInInitializerError is caused by an IllegalStateException: 
>>> zip file closed error. Documentation indicates this could be a corrupted 
>>> jar file or something similar. I don't know what jar file could be involved 
>>> except the tess4j, and I've replaced that twice as I've said.
>>>
>>> I've also tried this with two versions of the Tesseract-OCR dll -- 
>>> 5.0.1.20220118 and v5.2.0.20220712; they were both installed for "anyone on 
>>> this computer" with default options. They give the same results.
>>>
>>> Can someone help me figure out what's wrong?
>>>
>>> Stack trace:
>>>
>>> java.lang.ExceptionInInitializerError
>>> at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:427)
>>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:311)
>>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:294)
>>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:275)
>>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:259)
>>> at 
>>> drivingrecordtool.file.DrivingRecordPDFTextReader.getOCRText(DrivingRecordPDFTextReader.java:152)
>>> at 
>>> drivingrecordtool.file.DrivingRecordPDFTextReader.getText(DrivingRecordPDFTextReader.java:46)
>>> at 
>>> drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:78)
>>> at 
>>> drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:1)
>>> at 
>>> java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304)
>>> at 
>>> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>>> at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343)
>>> at 
>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>> at 
>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>> at java.base/java.lang.Thread.run(Thread.java:834)
>>> Caused by: java.l

[tesseract-ocr] Re: how to combine (merge) trained data of Tesseract files

2022-09-28 Thread Quan Nguyen
Merging two traindeddata files is not possible nor supported. What you can 
do is rename your custom language pack to, say, eng1.traindeddata and then 
specify -l eng+eng1 when running the tesseract executable.

On Sunday, September 25, 2022 at 4:22:45 PM UTC-5 fishmo...@gmail.com wrote:

>
> 
>
> I have trained new font for english language with Tesseract OCR 
> (JavaTessBoxEditor) I received: eng.trainddata inttemp normproto pffmtable 
> shapetable unicharset files
>
> I've tried to make combine_data, but nothing changed to main 
> eng.traindeddata file. It was not not amended.
>
> Then I've tried to download 3rd Party Software (QT version from Zdenko, 
> GUI from General Delopment NL) No result still
>
> I ve searched all forums and youtube vids, no a clue how to combine main 
> eng file and new eng trained file.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/96d2fc15-9edd-4ab3-8d93-c81149374645n%40googlegroups.com.


[tesseract-ocr] Re: ErrorInInitializerError - zip file closed out of tess4j.util.LoadLibs.getTesseractLibName

2022-09-22 Thread Quan Nguyen
You may want to try VietOCR, a Java desktop app that uses Tess4J. It works 
with Java 8, 18, and probably 11 too.

http://vietocr.sf.net

On Thursday, September 22, 2022 at 7:10:54 AM UTC-5 rcja...@gmail.com wrote:

> I am running a Java desktop application on Windows 10 Pro using Tess4j; it 
> was working fine with Java 1.8, am now trying to get it to run with Java 11 
> (11.0.15.1). When it calls 
> net.sourceforge.tess4j.Tesseract.doOCR(BufferedImage 
> bi), it gets a Throwable ExceptionInInitializerError, with a 'cause' 
> exception indicating a call to 
> net.sourceforge.tess4j.util.LoadLibs.getTesseractLibName().
>
> Oddly enough, the application works fine from within eclipse, but fails 
> running on its own (for instance, running "java -jar program.jar" from a 
> cmd window). As I said, it was running with Java 1.8, and I might be able 
> to get it running again by uninstalling Java and reinstalling that JRE, but 
> there are other people that are going to use the application and it is not 
> reasonable to expect them to use a Java version that is that old. 
>
> I had the Maven dependency:
>
> net.sourceforge.tess4j
> tess4j
> 4.4.1
>
> The only copy of tess4j-4.1.1.jar on my hard drive is in my 
> .m2\repository\net\sourceforge\tess4j\tess4j\4.4.1. I have twice deleted it 
> and updated my maven configuration to restore it, to insure it is not 
> corrupted.
>
> I noticed that this is not the most recent Tess4j, so updated the 
> dependency to 5.4.0 and rebuilt. This has the same results: works in 
> eclipse, not from command line.
>
> The ExceptionInInitializerError is caused by an IllegalStateException: zip 
> file closed error. Documentation indicates this could be a corrupted jar 
> file or something similar. I don't know what jar file could be involved 
> except the tess4j, and I've replaced that twice as I've said.
>
> I've also tried this with two versions of the Tesseract-OCR dll -- 
> 5.0.1.20220118 and v5.2.0.20220712; they were both installed for "anyone on 
> this computer" with default options. They give the same results.
>
> Can someone help me figure out what's wrong?
>
> Stack trace:
>
> java.lang.ExceptionInInitializerError
> at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:427)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:311)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:294)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:275)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:259)
> at 
> drivingrecordtool.file.DrivingRecordPDFTextReader.getOCRText(DrivingRecordPDFTextReader.java:152)
> at 
> drivingrecordtool.file.DrivingRecordPDFTextReader.getText(DrivingRecordPDFTextReader.java:46)
> at 
> drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:78)
> at 
> drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:1)
> at 
> java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304)
> at 
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.IllegalStateException: zip file closed
> at java.base/java.util.zip.ZipFile.ensureOpen(ZipFile.java:913)
> at java.base/java.util.zip.ZipFile.getEntry(ZipFile.java:348)
> at java.base/java.util.zip.ZipFile$1.getEntry(ZipFile.java:1130)
> at java.base/java.util.jar.JarFile.getEntry0(JarFile.java:586)
> at java.base/java.util.jar.JarFile.getEntry(JarFile.java:516)
> at 
> java.base/sun.net.www.protocol.jar.URLJarFile.getEntry(URLJarFile.java:131)
> at java.base/java.util.jar.JarFile.getJarEntry(JarFile.java:478)
> at 
> java.base/jdk.internal.loader.URLClassPath$JarLoader.getResource(URLClassPath.java:945)
> at 
> java.base/jdk.internal.loader.URLClassPath.getResource(URLClassPath.java:315)
> at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:455)
> at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:452)
> at java.base/java.security.AccessController.doPrivileged(Native 
> Method)
> at 
> java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:451)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
> at 
> net.sourceforge.tess4j.util.LoadLibs.getTesseractLibName(LoadLibs.java:95)
> at 
> net.sourceforge.tess4j.util.LoadLibs.getTessAPIInstance(LoadLibs.java:85)
> at 

[tesseract-ocr] VietOCR v6.3.0 & VietOCR.NET v6.3.0 Releases

2022-07-08 Thread Quan Nguyen


A Java/.NET WPF GUI frontend for Tesseract OCR engine. The releases include 
the following improvements:

   - Upgrade to Tesseract 5.2.0

http://vietocr.sf.net

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ae5bd5c3-d75c-4802-84e6-e8fb9b59ded5n%40googlegroups.com.


[tesseract-ocr] Re: train tess4j on a specific font?

2022-01-30 Thread Quan Nguyen
Not about training, but you should use the latest version of tess4j that 
corresponds to the latest Tesseract releases.

https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j
https://github.com/nguyenq/tess4j

Hope it will produce better results for you.

On Saturday, January 29, 2022 at 1:52:48 AM UTC-6 Bernd Angelo wrote:

> Hello, I am having trouble getting numbers recognized.
> I am using Tess4J from http://tess4j.sourceforge.net/ 
> which, if I am not wrong, is using Tesseract 3.05 in the background.
>
> I followed the instructions outlined here:
> http://tess4j.sourceforge.net/tutorial/
> (using the command line version, no eclipse, maven or other sh't)
>
> I can modify the TesseractExample.java file without an issue and doing the 
> 2 command line commands mentioned in the site above, can do an tesseract 
> ocr scan on any png or jpg I want.
>
> Now you see what I in the end want to do is use ocr to make my program 
> "read" the balance of an online casino and with that balance now given as a 
> string variable, I will do all kinds of actions based on it.
> so reading the numbers properly is important.
>
> Now for test purposes I took 2 screenshots that together include all the 
> different digits that can appear, so 0-9.
>
> when I do the normal ocr as instructed in the page above, (from my 
> knowledge, it then uses the pre-trained standard eng.traineddata file)
> sadly both the digits 4 and 6 in the image are read as 5.
> the euro sign € is also as the pound sign isntead but that is of minor 
> importance to me.
> the ocr not being able to distinguish between 4 and 6 really sucks.
>
> The pictures used are these ones:
> https://ibb.co/ZTRFqVg
> https://ibb.co/p23w7nj
>
> As said, they are basically screenshots of the casino site and so I cant 
> influence the font or size or anything.
>
> as said, the ocr reads the "4,6" part as "5,5".
>
> which is bad.
> So I thought, why not use the 2 images to train tesseract, as obviously 
> tesseract having seen all the possible digits should give it 100% accuracy, 
> right?
> well, I got myself jtessboxeditor, got myself serrak tesseract trainer, 
> did a ton of stuff and created the traineddata from the image.
> and made the ocr file use it to try to ocr the image again.
> well, I wrote a line in my code to System.out.print the string and also 
> write down its length.
> I dont know what ocr does. but the stuff written as a result in the 
> command line window is an empty line (where the result string should stand) 
> and string length is claimed  to be 6 (it should be 11 with all the digits. 
> and , involved).
> so I dont know watf ocr is doing, is sucks way harder than with the 
> standard eng language.
>
> so I did some bit of googling, apparently the font "Alte DIN 1451 
> Mittelschrift" is VERY similar to my number, the casino (for the balance 
> display at least) uses this font or a very similar one.
> so while I know about a font worth training with (I also already 
> downloaded it's ttf file) I havent the slightest idea how to train with the 
> font.
>
> Can someone please help me, explain to me why the ocr result can be that 
> bad after training with the actual image to ocr?
> (was a pain to perfectly fit the rectangles to the digits!)
> or how to train tess4j with the given font?
> google even tells me about such a one click service but sadly it is 
> apparently gone by now.
>
> can someone help me please? :-)
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1bd230a7-b761-401f-80ce-acec7dd67a4an%40googlegroups.com.


[tesseract-ocr] Re: Tesseract 3.05 build on windows x86

2021-08-20 Thread Quan Nguyen
Tesseract Windows executable can be downloaded from:

https://digi.bib.uni-mannheim.de/tesseract/

On Thursday, July 22, 2021 at 3:38:26 AM UTC-5 luys...@gmail.com wrote:

> Hello Every Body.
> I'm trying to build tesseract 3.05 and leptonica 1.74 on x86 windows.
> But I can't get image library like libjpeg, libtiff, libpng and so on for 
> leptonica building.
> Who can help me to do this? Thanks in advance.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e09e5698-f7cf-4e00-9ba8-ea5f0a295ca2n%40googlegroups.com.


[tesseract-ocr] Re: Creating training data for a language with a complex name, like ita_old or chi_sim_vert

2021-08-20 Thread Quan Nguyen
Pick a name that it accepts and then rename the output file to desirable 
names.

On Wednesday, August 18, 2021 at 5:40:05 AM UTC-5 smn...@gmail.com wrote:

> Hello,
>
> I try to create training data for a language with a complex name similar 
> to ita_old or chi_sim_vert. However when I run the command:
>
> tesstrain.sh --lang eng_old  --fonts_dir 
>
> I get this error:
>
> === Starting training for language 'eng_old'
> ERROR: Error: eng_old is not a valid language code
>
> How can I cause tesstrain.sh to accept 'eng_old' the way 'ita_old' is 
> accepted?
>
> Thank you in advance!
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/03ee4814-44a9-49a5-917f-6ce88b0cbe09n%40googlegroups.com.


[tesseract-ocr] Re: shapeclustering error bad_alloc on Tesseract 3.05

2021-08-20 Thread Quan Nguyen
The command line imposes a limit on the command length. You can group 
images of same fonts in a multi-page TIFF to cut down the number of files 
and then conduct training on it.

On Monday, August 16, 2021 at 10:32:52 PM UTC-5 gan...@gmail.com wrote:

> Teaseract 3.05.  I have 2,000 * .TR files.  I ran the following command.  
> shapeclustering -f font_properties -u unicharaet lang.fontname0.exp0.tr 
> lang.fontname1.exp0.tr….  
> But when it goes to 200, it shows error std :: bad_alloc   
> I have 32GB RAM.  How to solve this error?
>
> Thanks in advance
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ec437ee3-7bed-43f1-96e1-b3af2153f79dn%40googlegroups.com.


[tesseract-ocr] Re: tessedit_create_boxfile condensed like boxaGetBox

2021-04-21 Thread Quan Nguyen
I think it would need to operate at RIL_SYMBOL level, not RIL_TEXTLINE.

On Wednesday, April 21, 2021 at 7:17:04 AM UTC-5 yosoyl...@gmail.com wrote:

> Hi, when I pass tessedit_create_boxfile 1 argument to tesseract it outputs 
> individual chars' location. But when I use api like this:
>
> ```
> Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true,NULL,
> NULL);
> for(int i = 0; i < boxes->n; i++){
> BOX* box =boxaGetBox(boxes,i,L_CLONE);
> api->SetRectangle(box->x,box->y,box->w,box->h);
> char* outText = api->GetUTF8Text();
> int conf = api->MeanTextConf();
> fprintf(stdout,"Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: %s"
> ,
> i, box->x, box->y, box->w, box->h, conf, outText);
> boxDestroy();
> delete[] outText;
> }
> ```
> it outputs whole line like this:
> Box[1]: x=36, y=84, w=246, h=14, confidence: 44, text: #Spor #siyaset 
> Fanket FIliskiler
>
> Is there any way to combine individual boxes to print like API? Thanks in 
> advance.
>
>
>
>
>
>
> 
> ### Environment
>
> * **Tesseract Version**: 
> tesseract 4.1.1-rc2-25-g9707
>  leptonica-1.78.0
>   libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : 
> libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>  Found AVX2
>  Found AVX
>  Found FMA
>  Found SSE
>  Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 
> liblz4/1.8.3 libzstd/1.3.8
>
> * **Platform**: 
> Linux pardus 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 
> GNU/Linux
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a20ef4b7-9f76-4f20-a867-5d6f60fc6c62n%40googlegroups.com.


[tesseract-ocr] Re: Choosing background when generating output using PDF config.

2020-12-20 Thread Quan Nguyen
I don't think Tesseract supports this. You may want to try to generate a 
text-only searchable PDF file and superimpose it on the original PDF file.

On Wednesday, November 11, 2020 at 10:25:07 AM UTC-6 jonas.pau...@gmail.com 
wrote:

> Hello.
>
> I've got some input document input.pdf. This comes straight from a scanner 
> and thus I do some preprocessing to improve accuracy (i.e., unpaper, 
> black/white, increased contrast), which yields preprocessed.png.
>
> When using the command
>
> tesseract preprocessed.png output pdf
>
> I receive a document, which has the ocr'ed text embedded. Great! However: 
> Can I tell tesseract to use the original document input.pdf as the 
> background (i.e., the one without preprocessing) of the generated PDF while 
> still performing ocr on the preprocessed input?
>
> Thanks,
> Jonas
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6b1b45e3-367e-4395-a28d-742e2202c904n%40googlegroups.com.


[tesseract-ocr] Re: Improve ocr on screendump

2020-12-20 Thread Quan Nguyen
You may need to scale the image to 300 DPI for better results. This is 
especially true for screenshots, where the resolution is typically at 72 or 
96 DPI.

On Tuesday, November 10, 2020 at 3:40:40 AM UTC-6 player1 wrote:

> Hi Folks
>
> Im new to Tesseract and need some pointers on how to improve the ouput 
> from a game screen dump.
>
> It has some game stats with different types of fonts, at different sizes 
> and one font is skewed to the side.
>
> The screendump has background graphics but its toned down as not to 
> disturb human readings the page.
>
> The screendump might have different resolutions but the position of texts 
> are fixed to particular regions.
>
> So far I have tried reading the page (with tess4J) at 120 DPI and only the 
> simplest text which looks to be about 20pt in size is read out correctly, 
> bigger fonts are completely lost.
>
> What options do I have to improve the output form Tesseract?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/14e8cf91-b1bf-4301-9652-a03aa661a387n%40googlegroups.com.


[tesseract-ocr] Re: Binary File

2020-12-20 Thread Quan Nguyen
https://github.com/UB-Mannheim/tesseract/wiki

On Tuesday, November 10, 2020 at 8:23:15 AM UTC-6 Rehan Qasim wrote:

> I need exe file for installation can anybody share it ...
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/78fae0a1-586d-4fd7-bb84-8513ff5525een%40googlegroups.com.


Re: [tesseract-ocr] From where to download

2020-12-20 Thread Quan Nguyen
https://github.com/UB-Mannheim/tesseract/wiki

On Monday, December 14, 2020 at 10:02:39 PM UTC-6 nairy...@gmail.com wrote:

> Windows 10 32 bit
>
> On Tuesday, December 8, 2020 at 11:44:35 AM UTC+5:30 
> sachinraj...@gmail.com wrote:
>
>> I mean is. Which os are you using?
>>
>> On Tue, 8 Dec, 2020, 11:34 am Sachin Rajput,  
>> wrote:
>>
>>> Which is are you using?
>>>
>>> On Tue, 8 Dec, 2020, 11:26 am YOPLAYER 1 Tech and Game, <
>>> nairy...@gmail.com> wrote:
>>>
>>
 Hi,
 I wanted to download Tesseract OCR but I don't know from where. I have 
 seen many video's but no video has a download link.After downloading I 
 will 
 use it in my Python Project.

 I also sent a message earlier but I got no reply. 

 Regards,
 Tony

 -- 
 You received this message because you are subscribed to the Google 
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to tesseract-oc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/tesseract-ocr/4a5f92f9-2746-4e07-a769-10e13e2f61b1n%40googlegroups.com
  
 
 .

>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/73980419-a89a-45e3-b0f6-1986a539d45en%40googlegroups.com.


[tesseract-ocr] Re: I am getting different output results on windows and linux. Need to know equivalent versions for tesseract OCR and tess4j api.

2020-09-06 Thread Quan Nguyen
You look at the project's website?

http://tess4j.sourceforge.net/changelog.html

On Saturday, September 5, 2020 at 11:28:19 AM UTC-5 jiten@finicity.com 
wrote:

> Hello,
>
> I am using Kubernetes Runtime which has linux OS for deployment of my 
> application. While for Development, I am using Windows.
> I am installing tesseract in Linux system and using tess4j for accessing 
> API Objects, while on windows I am not using any tesseract OCR installation 
> as i am directly using tess4j api;  I want to know equivalent versions for 
> tesseract OCR and tess4j api. 
> I am getting different output results on windows and linux.
>
> Any Help is appreciated!
>
> Hope to hear from you soon.
>
> Thanks.
>
>
> *Disclaimer*
>
> The information contained in this communication from the sender is 
> confidential. It is intended solely for use by the recipient and others 
> authorized to receive it. If you are not the recipient, you are hereby 
> notified that any disclosure, copying, distribution or taking action in 
> relation of the contents of this information is strictly prohibited and may 
> be unlawful.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bdac49f0-c806-4627-93b3-473eed2d3390n%40googlegroups.com.


[tesseract-ocr] Re: Tesseract 4 C++ API .NET Wrapper

2020-06-17 Thread Quan Nguyen
https://github.com/charlesw/tesseract/tree/feature/321-Tesseract-4

On Monday, June 15, 2020 at 3:44:53 PM UTC-5, Tim Snyder wrote:
>
> Hello,
>
> Curious if anyone knows of a .NET wrapper for the Tesseract 4 C++ API? 
> Looking for something similar to the Python tesserocr package.
>
> Thanks.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d35119f2-bf33-41bd-8d6d-b355fe1c8360o%40googlegroups.com.


[tesseract-ocr] Re: Where to download the dutch language pack?

2020-06-02 Thread Quan Nguyen
You can try VietOCR .

On Monday, June 1, 2020 at 5:01:15 AM UTC-5, Mike Dewul wrote:
>
>
> Any other free tool similar to the (a9t9)FreeOcrWindowsDesktop ? 
> i.e. batch, images, using Tesseract.
>
> Thanks.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9323549b-2b3d-43a7-9b2f-ea5c9b2a8ee2%40googlegroups.com.


[tesseract-ocr] Re: Where to download the dutch language pack?

2020-06-02 Thread Quan Nguyen
You can try VietOCR .

On Monday, June 1, 2020 at 5:01:15 AM UTC-5, Mike Dewul wrote:
>
>
> Any other free tool similar to the (a9t9)FreeOcrWindowsDesktop ? 
> i.e. batch, images, using Tesseract.
>
> Thanks.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0ee12734-49b5-4bc3-bd88-af63a9951444%40googlegroups.com.


[tesseract-ocr] Re: Tess4J: Invalid memory access

2020-02-15 Thread Quan Nguyen
setDatapath should be set to the path to tessdata folder, which contains 
*.traineddata files. It's not the path to your image files.

On Saturday, February 15, 2020 at 8:14:09 AM UTC-6, Rajith Kariyawsam wrote:
>
> Hi Quan,
>
> I got the point. By the below video.
> I miss download dependency.
> https://www.youtube.com/watch?v=5DqW9KP-aQo=425s
>
> And I will try that.
> Thank you very much.
>
> On Saturday, February 15, 2020 at 7:22:36 PM UTC+5:30, Rajith Kariyawsam 
> wrote:
>>
>> Hi Quan,
>>
>> 'pth' is the image location in my PC. 
>> I verified it with debug mood too.
>> As I know image location should set to the 'Datapath.'
>>
>> If the 'pth' is incorrect what should pass for that parameter. 
>>
>> Realy helpful if you can further explain it to me, please ?
>>
>> On Saturday, February 15, 2020 at 10:11:43 AM UTC+5:30, Quan Nguyen wrote:
>>>
>>> cptcha.setDatapath(pth); < incorrect pth value
>>>
>>>
>>> On Wednesday, February 12, 2020 at 10:00:31 PM UTC-6, Rajith Kariyawsam 
>>> wrote:
>>>>
>>>> Hi Quan,
>>>> I didn't got wht do you mean by 'tessdata ' folder.
>>>> given pth is the copied image(png) location.  my image name is* 
>>>> 'testcap.png'*
>>>>
>>>> as per the below line 
>>>>
>>>> String pth = "C:\\Users\\username\\Downloads\\capthca1\\testcap.png";
>>>>
>>>> FileHandler.copy(imgFile, new File(pth));
>>>>
>>>>
>>>>
>>>> Appreciate it if you can further describe it, please.
>>>>
>>>>
>>>>
>>>> On Thursday, February 13, 2020 at 12:16:27 AM UTC+5:30, Quan Nguyen 
>>>> wrote:
>>>>>
>>>>> It looks like the datapath is set incorrectly. It should be set to 
>>>>> tessdata folder.
>>>>>
>>>>> On Tuesday, February 11, 2020 at 2:30:45 AM UTC-6, Rajith Kariyawsam 
>>>>> wrote:
>>>>>>
>>>>>> Still, the same error occurred for me.
>>>>>>
>>>>>> code: 
>>>>>>
>>>>>> 
>>>>>> net.sourceforge.tess4j
>>>>>> tess4j
>>>>>> 4.3.1
>>>>>> 
>>>>>>
>>>>>>
>>>>>> 
>>>>>> org.seleniumhq.selenium
>>>>>> selenium-java
>>>>>> 3.141.59
>>>>>> 
>>>>>>
>>>>>>
>>>>>> File imgFile = 
>>>>>> findElement(captchaimgIdPath).getScreenshotAs(OutputType.FILE);
>>>>>> String pth = "C:\\Users\\username\\Downloads\\capthca1\\testcap.png"; 
>>>>>> //src/main/resources
>>>>>> Thread.sleep(2000);
>>>>>> FileHandler.copy(imgFile, new File(pth));
>>>>>> Thread.sleep(2000);
>>>>>> Tesseract cptcha = new Tesseract();
>>>>>> cptcha.setDatapath(pth);
>>>>>> cptcha.setLanguage("eng");
>>>>>> String text = cptcha.doOCR(new File(pth));
>>>>>>
>>>>>> System.out.println(text);
>>>>>>
>>>>>>
>>>>>> On Sunday, September 2, 2018 at 10:20:53 PM UTC+5:30, Subramaniyan 
>>>>>> Suresh wrote:
>>>>>>>
>>>>>>> I am using Tess4J in my project to extract text from an image (Using 
>>>>>>> Eclipse IDE). I am getting the following error when I try run the OCR. 
>>>>>>> Any 
>>>>>>> suggestion?  
>>>>>>>
>>>>>>> *Error: Exception in thread "main" java.lang.Error: Invalid memory 
>>>>>>> access*
>>>>>>>
>>>>>>>
>>>>>>> *Note: I have attached the image file which I've used *
>>>>>>>
>>>>>>> *My Code*:
>>>>>>>
>>>>>>>
>>>>>>> package tesseractTraining;
>>>>>>>
>>>>>>>
>>>>>>> import java.io.File;
>>>>>>>
>>>>>>> import net.sourceforge.tess4j.*;
>>>>>>>
>>>>>>>
>>>>>>> public class TesseractMainRunner {
>>>>>>>
>>>>>>> public static void main(String[] args) {
>>>>>>>
>>>>>>> File imageFile = new File("E:\\Tesseract\\Test Images\\sample.png");
>>>>>>>
>>>>>>> Tesseract instance = new Tesseract();
>>>>>>>
>>>>>>> try {
>>>>>>>
>>>>>>> instance.setDatapath("C:\\Program Files 
>>>>>>> (x86)\\Tesseract-OCR\\tessdata");
>>>>>>>
>>>>>>> instance.setLanguage("eng");
>>>>>>>
>>>>>>> String result = instance.doOCR(imageFile);
>>>>>>>
>>>>>>> System.out.println(result);
>>>>>>>
>>>>>>> } catch (TesseractException e) {
>>>>>>>
>>>>>>> System.err.println(e.getMessage());
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> imageFile.exists();
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0ddb242e-a56a-4804-a9bc-25731ce6273d%40googlegroups.com.


[tesseract-ocr] Re: Finetune tesseract fonts

2020-02-14 Thread Quan Nguyen
https://github.com/tesseract-ocr/tessdoc/blob/master/TrainingTesseract-4.00.md

On Friday, February 14, 2020 at 2:43:28 AM UTC-6, susil mishra wrote:
>
> I am new to tesseract and using 4.0 version and try to fine tune my 
> existing font. Could some one help me to provide the steps to train 
> existing font.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/003c6517-af69-424d-8009-6cdb6e0057f7%40googlegroups.com.


[tesseract-ocr] Re: checkbox recognition-Tesseract 4

2020-02-14 Thread Quan Nguyen
jTessBoxEditor is for training for Tesseract 3.0x format only. For 4.0x, 
please consult 
https://github.com/tesseract-ocr/tessdoc/blob/master/TrainingTesseract-4.00.md
 

On Thursday, February 13, 2020 at 8:37:59 AM UTC-6, PD wrote:
>
> 0
> 
>
> Hello
>
> Is there anyway where Tesseract 4 can be trained for checkbox ? I want to 
> train Tesseract for empty checkbox , checkbox with cross/check sign. 
> Default English trained data does not identify checkbox.I tried defining 
> new font using jTessBoxEditor and trained it using this tool. but no 
> success.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/cf6226d5-3c88-4282-acec-b49363988f4c%40googlegroups.com.


[tesseract-ocr] Re: Tess4J: Invalid memory access

2020-02-14 Thread Quan Nguyen


cptcha.setDatapath(pth); < incorrect pth value


On Wednesday, February 12, 2020 at 10:00:31 PM UTC-6, Rajith Kariyawsam 
wrote:
>
> Hi Quan,
> I didn't got wht do you mean by 'tessdata ' folder.
> given pth is the copied image(png) location.  my image name is* 
> 'testcap.png'*
>
> as per the below line 
>
> String pth = "C:\\Users\\username\\Downloads\\capthca1\\testcap.png";
>
> FileHandler.copy(imgFile, new File(pth));
>
>
>
> Appreciate it if you can further describe it, please.
>
>
>
> On Thursday, February 13, 2020 at 12:16:27 AM UTC+5:30, Quan Nguyen wrote:
>>
>> It looks like the datapath is set incorrectly. It should be set to 
>> tessdata folder.
>>
>> On Tuesday, February 11, 2020 at 2:30:45 AM UTC-6, Rajith Kariyawsam 
>> wrote:
>>>
>>> Still, the same error occurred for me.
>>>
>>> code: 
>>>
>>> 
>>> net.sourceforge.tess4j
>>> tess4j
>>> 4.3.1
>>> 
>>>
>>>
>>> 
>>> org.seleniumhq.selenium
>>> selenium-java
>>> 3.141.59
>>> 
>>>
>>>
>>> File imgFile = 
>>> findElement(captchaimgIdPath).getScreenshotAs(OutputType.FILE);
>>> String pth = "C:\\Users\\username\\Downloads\\capthca1\\testcap.png"; 
>>> //src/main/resources
>>> Thread.sleep(2000);
>>> FileHandler.copy(imgFile, new File(pth));
>>> Thread.sleep(2000);
>>> Tesseract cptcha = new Tesseract();
>>> cptcha.setDatapath(pth);
>>> cptcha.setLanguage("eng");
>>> String text = cptcha.doOCR(new File(pth));
>>>
>>> System.out.println(text);
>>>
>>>
>>> On Sunday, September 2, 2018 at 10:20:53 PM UTC+5:30, Subramaniyan 
>>> Suresh wrote:
>>>>
>>>> I am using Tess4J in my project to extract text from an image (Using 
>>>> Eclipse IDE). I am getting the following error when I try run the OCR. Any 
>>>> suggestion?  
>>>>
>>>> *Error: Exception in thread "main" java.lang.Error: Invalid memory 
>>>> access*
>>>>
>>>>
>>>> *Note: I have attached the image file which I've used *
>>>>
>>>> *My Code*:
>>>>
>>>>
>>>> package tesseractTraining;
>>>>
>>>>
>>>> import java.io.File;
>>>>
>>>> import net.sourceforge.tess4j.*;
>>>>
>>>>
>>>> public class TesseractMainRunner {
>>>>
>>>> public static void main(String[] args) {
>>>>
>>>> File imageFile = new File("E:\\Tesseract\\Test Images\\sample.png");
>>>>
>>>> Tesseract instance = new Tesseract();
>>>>
>>>> try {
>>>>
>>>> instance.setDatapath("C:\\Program Files 
>>>> (x86)\\Tesseract-OCR\\tessdata");
>>>>
>>>> instance.setLanguage("eng");
>>>>
>>>> String result = instance.doOCR(imageFile);
>>>>
>>>> System.out.println(result);
>>>>
>>>> } catch (TesseractException e) {
>>>>
>>>> System.err.println(e.getMessage());
>>>>
>>>> }
>>>>
>>>> imageFile.exists();
>>>>
>>>> }
>>>>
>>>>
>>>> }
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5fe365a1-1e3b-470c-9911-915773cff152%40googlegroups.com.


[tesseract-ocr] Re: Tess4J: Invalid memory access

2020-02-12 Thread Quan Nguyen
It looks like the datapath is set incorrectly. It should be set to tessdata 
folder.

On Tuesday, February 11, 2020 at 2:30:45 AM UTC-6, Rajith Kariyawsam wrote:
>
> Still, the same error occurred for me.
>
> code: 
>
> 
> net.sourceforge.tess4j
> tess4j
> 4.3.1
> 
>
>
> 
> org.seleniumhq.selenium
> selenium-java
> 3.141.59
> 
>
>
> File imgFile = findElement(captchaimgIdPath).getScreenshotAs(OutputType.FILE);
> String pth = "C:\\Users\\username\\Downloads\\capthca1\\testcap.png"; 
> //src/main/resources
> Thread.sleep(2000);
> FileHandler.copy(imgFile, new File(pth));
> Thread.sleep(2000);
> Tesseract cptcha = new Tesseract();
> cptcha.setDatapath(pth);
> cptcha.setLanguage("eng");
> String text = cptcha.doOCR(new File(pth));
>
> System.out.println(text);
>
>
> On Sunday, September 2, 2018 at 10:20:53 PM UTC+5:30, Subramaniyan Suresh 
> wrote:
>>
>> I am using Tess4J in my project to extract text from an image (Using 
>> Eclipse IDE). I am getting the following error when I try run the OCR. Any 
>> suggestion?  
>>
>> *Error: Exception in thread "main" java.lang.Error: Invalid memory access*
>>
>>
>> *Note: I have attached the image file which I've used *
>>
>> *My Code*:
>>
>>
>> package tesseractTraining;
>>
>>
>> import java.io.File;
>>
>> import net.sourceforge.tess4j.*;
>>
>>
>> public class TesseractMainRunner {
>>
>> public static void main(String[] args) {
>>
>> File imageFile = new File("E:\\Tesseract\\Test Images\\sample.png");
>>
>> Tesseract instance = new Tesseract();
>>
>> try {
>>
>> instance.setDatapath("C:\\Program Files (x86)\\Tesseract-OCR\\tessdata");
>>
>> instance.setLanguage("eng");
>>
>> String result = instance.doOCR(imageFile);
>>
>> System.out.println(result);
>>
>> } catch (TesseractException e) {
>>
>> System.err.println(e.getMessage());
>>
>> }
>>
>> imageFile.exists();
>>
>> }
>>
>>
>> }
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8635f319-bb5b-48a3-88f5-b75a52b4df7a%40googlegroups.com.


[tesseract-ocr] Re: Missing Characters

2020-02-04 Thread Quan Nguyen
It looks like Times New Romon font does not have the glyphs for the 
characters of your interest. You'll need to select a compatible font.

Btw, that application is jTessBoxEditor, not VietOCR.

On Tuesday, February 4, 2020 at 11:02:47 AM UTC-6, Peyi Oyelo wrote:
>
> Hello,
>
>
> I am currently using VietOCR on Ubuntu 18 to try to create box files, but 
> I am unable to see some characters. I am working with Akan Twi which has a 
> general english script (with some missing characters) and some borrowed 
> characters from the Greek script. The greek characters are limited to ɛ 
> and ɔ. I am currently trying to fine-tune the existing default English 
> mode to recognize these characters. However, VietOCR shows these characters 
> as empty boxes.
>
> Please how can I resolve this
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5b12c2be-8acb-464b-8caf-e49ebc121fb1%40googlegroups.com.


[tesseract-ocr] Re: tess-two with tessdata_fast crashes

2019-12-09 Thread Quan Nguyen
Not bug; just not up to date with Tesseract 4.x.

On Monday, December 9, 2019 at 12:14:49 AM UTC-6, NY C wrote:
>
> I know there are new OcrEngineMode value in Tesseract.
> But not in tess-two.
>
> In tesseract 4.x, ocrEngineMode is :
>
> enum OcrEngineMode {
>   OEM_TESSERACT_ONLY,   // Run Tesseract only - fastest; deprecated
>   OEM_LSTM_ONLY,// Run just the LSTM line recognizer.
>   OEM_TESSERACT_LSTM_COMBINED,  // Run the LSTM recognizer, but allow 
> fallback
> // to Tesseract when things get difficult.
> // deprecated
>   OEM_DEFAULT,  // Specify this mode when calling init_*(),
> // to indicate that any of the above modes
> // should be automatically inferred from 
> the
> // variables in the language-specific 
> config,
> // command-line configs, or if not 
> specified
> // in any of the above should be set to the
> // default OEM_TESSERACT_ONLY.
>   OEM_COUNT // Number of OEMs
> };
>
> However, in the newest release of tess-two, the ocrEngineMode is :
>
> @IntDef({OEM_TESSERACT_ONLY, OEM_CUBE_ONLY, 
> OEM_TESSERACT_CUBE_COMBINED, OEM_DEFAULT})
> public @interface OcrEngineMode {}
> public static final int OEM_TESSERACT_ONLY = 0;
> @Deprecated
> public static final int OEM_CUBE_ONLY = 1;
> @Deprecated
> public static final int OEM_TESSERACT_CUBE_COMBINED = 2;
> public static final int OEM_DEFAULT = 3;
>
> If there is no way to set OEM_LSTM_ONLY in tess-two,
> I can only assume this is a bug in tess-two.
>
>
>
> Quan Nguyen於 2019年12月9日星期一 UTC+8上午12時38分56秒寫道:
>>
>> There are new OcrEngineMode 
>> <https://github.com/tesseract-ocr/tesseract/blob/master/include/tesseract/publictypes.h>
>>  
>> values.
>>
>>
>> On Saturday, December 7, 2019 at 7:37:49 PM UTC-6, NY C wrote:
>>>
>>> Hi, I am using tess-two for OCR.
>>>
>>>
>>> (Alex Chon version : https://github.com/alexcohn/tess-two 
>>> <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Falexcohn%2Ftess-two=D=1=AFQjCNEQGm3c_HnjOOVpdOoDYCwnElOb5Q>
>>> )
>>>
>>>
>>> Code:
>>>
>>> TessBaseAPI baseApi = new TessBaseAPI();
>>> baseApi.setDebug(true);
>>> baseApi.init(pathfiles, language);
>>> //baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "0123456789");
>>> baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
>>> baseApi.setImage(bmp);
>>> result= baseApi.getUTF8Text();
>>> baseApi.end();
>>>
>>>
>>> The code run perfectly when I use this tessdata :
>>> https://github.com/tesseract-ocr/tessdata
>>>
>>> But when I use tessdata_fast (
>>> https://github.com/tesseract-ocr/tessdata_fast), The code crashes on 
>>> baseApi.init.
>>>
>>>
>>> There is no error message since the init method calls native C++. As far 
>>> as I can trace, the init method crashes on this line:
>>>
>>> boolean success = nativeInitOem(mNativeData, datapath, language, 
>>> ocrEngineMode);
>>>
>>>
>>> I also tried to set the OEM like this: 
>>>
>>>   baseApi.init(pathfiles, language, TessBaseAPI.OEM_CUBE_ONLY);
>>>
>>>
>>> All the OEM parameters have been tried :
>>>
>>> (OEM_TESSERACT_ONLY = 0, OEM_CUBE_ONLY = 1, OEM_TESSERACT_CUBE_COMBINED 
>>> = 2, OEM_DEFAULT = 3) 
>>>
>>> Crashes as well.
>>>
>>>
>>> How could I fix this?
>>>
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9dad07d8-3ab9-4af3-8296-18ed37e29f02%40googlegroups.com.


[tesseract-ocr] Re: tess-two with tessdata_fast crashes

2019-12-08 Thread Quan Nguyen
There are new OcrEngineMode 

 
values.


On Saturday, December 7, 2019 at 7:37:49 PM UTC-6, NY C wrote:
>
> Hi, I am using tess-two for OCR.
>
>
> (Alex Chon version : https://github.com/alexcohn/tess-two 
> 
> )
>
>
> Code:
>
> TessBaseAPI baseApi = new TessBaseAPI();
> baseApi.setDebug(true);
> baseApi.init(pathfiles, language);
> //baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "0123456789");
> baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
> baseApi.setImage(bmp);
> result= baseApi.getUTF8Text();
> baseApi.end();
>
>
> The code run perfectly when I use this tessdata :
> https://github.com/tesseract-ocr/tessdata
>
> But when I use tessdata_fast (
> https://github.com/tesseract-ocr/tessdata_fast), The code crashes on 
> baseApi.init.
>
>
> There is no error message since the init method calls native C++. As far 
> as I can trace, the init method crashes on this line:
>
> boolean success = nativeInitOem(mNativeData, datapath, language, 
> ocrEngineMode);
>
>
> I also tried to set the OEM like this: 
>
>   baseApi.init(pathfiles, language, TessBaseAPI.OEM_CUBE_ONLY);
>
>
> All the OEM parameters have been tried :
>
> (OEM_TESSERACT_ONLY = 0, OEM_CUBE_ONLY = 1, OEM_TESSERACT_CUBE_COMBINED = 
> 2, OEM_DEFAULT = 3) 
>
> Crashes as well.
>
>
> How could I fix this?
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/da278c1a-5e04-4237-a1f6-10100dc54796%40googlegroups.com.


[tesseract-ocr] Re: Get Character or word error rate

2019-11-17 Thread Quan Nguyen
Yes.

On Sunday, November 17, 2019 at 1:20:56 AM UTC-6, Mobeen Ali wrote:
>
> Thanks for your reply @Quan, the x_wconf in titles is the confidence value 
> of the word?
>
> On Friday, November 15, 2019 at 1:45:07 AM UTC+3, Quan Nguyen wrote:
>>
>> Use hocr output format. It has confidence values in the output file.
>>
>> On Thursday, November 14, 2019 at 8:51:09 AM UTC-6, Mobeen Ali wrote:
>>>
>>> Hi everyone!
>>>
>>> I was wondering if there is a function or method to get character or 
>>> word error rate after applying tesseract-ocr on the image?
>>>
>>> What i mean is, for example,
>>>
>>> when i run the command
>>>
>>>- tesseract  test_image.tiff  output_text
>>>
>>> after writing the text in the output_text file, it could return me the 
>>> word error rate or character error rate of the text...
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d8a41e55-0f28-4ddc-9da9-7b50c4c0826a%40googlegroups.com.


[tesseract-ocr] Re: Get Character or word error rate

2019-11-14 Thread Quan Nguyen
Use hocr output format. It has confidence values in the output file.

On Thursday, November 14, 2019 at 8:51:09 AM UTC-6, Mobeen Ali wrote:
>
> Hi everyone!
>
> I was wondering if there is a function or method to get character or word 
> error rate after applying tesseract-ocr on the image?
>
> What i mean is, for example,
>
> when i run the command
>
>- tesseract  test_image.tiff  output_text
>
> after writing the text in the output_text file, it could return me the 
> word error rate or character error rate of the text...
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/019d3474-aa97-4b58-b510-12270bddfd97%40googlegroups.com.


[tesseract-ocr] Re: java.lang.UnsatisfiedLinkError in Windows Server 2008

2019-10-17 Thread Quan Nguyen
Can you try with version 4.3.1 or the latest version 4.4.1?

On Tuesday, October 15, 2019 at 10:56:14 AM UTC-5, Nuno Feliciano wrote:
>
> Hi,
>
> I am getting an error with Tess4j when I run it in a Windows Server 2008 
> R2 64 bit (tess4j-4.3.0).
>
> Exception in thread "main" java.lang.UnsatisfiedLinkError: 
> at com.sun.jna.Native.open(Native Method)
> at com.sun.jna.Native.open(Native.java:1759)
> at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:260)
> at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:398)
> at com.sun.jna.Library$Handler.(Library.java:147)
> at com.sun.jna.Native.loadLibrary(Native.java:412)
> at com.sun.jna.Native.loadLibrary(Native.java:391)
> at 
> net.sourceforge.tess4j.util.LoadLibs.getTessAPIInstance(LoadLibs.java:85)
> at net.sourceforge.tess4j.TessAPI.(TessAPI.java:42)
> at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:426)
> at net.sourceforge.tess4j.Tesseract.getWords(Tesseract.java:693)
>
> I am using jdk1.8 64 bit
> I don't have the error in a Windows 8.1 Enterprise 64 bit
>
> I have tried using dependecy walker to figure out which dlls I was 
> missing. I tried adding a few (dcomp,vcruntime140,msvcp140 and a few more), 
> but no luck.
>
> Can anyone help?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/288c2e81-6a76-41fa-b7b9-9fff1cad5d74%40googlegroups.com.


[tesseract-ocr] Re: java.lang.UnsatisfiedLinkError in tess4j

2019-08-13 Thread Quan Nguyen
https://github.com/nguyenq/tess4j/issues/159

On Thursday, August 8, 2019 at 12:58:10 AM UTC-5, Srikanth wrote:
>
> Hi,
>
> I am getting error on windows server 2016, 64bit. I have VC++ 2017 
> redistributable installed in my VM. My tessesract code is developed in 
> spring boot app which works correctly in my PC. I copied the same project 
> and tried to run on windows server 2016 virtual machine. I am getting below 
> error. Please help me
>
> java.lang.UnsatisfiedLinkError: The specified module could not be found.
>
> at com.sun.jna.Native.open(Native Method) ~[jna-4.5.2.jar:4.5.2 (b0)]
> at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:288) 
> ~[jna-4.5.2.jar:4.5.2 (b0)]
> at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:427) 
> ~[jna-4.5.2.jar:4.5.2 (b0)]
> at com.sun.jna.Library$Handler.(Library.java:179) 
> ~[jna-4.5.2.jar:4.5.2 (b0)]
> at com.sun.jna.Native.loadLibrary(Native.java:569) ~[jna-4.5.2.jar:4.5.2 
> (b0)]
> at com.sun.jna.Native.loadLibrary(Native.java:544) ~[jna-4.5.2.jar:4.5.2 
> (b0)]
> at 
> net.sourceforge.tess4j.util.LoadLibs.getTessAPIInstance(LoadLibs.java:81) 
> ~[tess4j-4.0.0.jar:4.0.0]
> at net.sourceforge.tess4j.TessAPI.(TessAPI.java:42) 
> ~[tess4j-4.0.0.jar:4.0.0]
> at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:364) 
> ~[tess4j-4.0.0.jar:4.0.0]
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:277) 
> ~[tess4j-4.0.0.jar:4.0.0]
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:209) 
> ~[tess4j-4.0.0.jar:4.0.0]
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:193) 
> ~[tess4j-4.0.0.jar:4.0.0]
> at com.test.job.MyQuartzJob.execute(MyQuartzJob.java:79) ~[classes/:na]
> at org.quartz.core.JobRunShell.run(JobRunShell.java:202) 
> ~[quartz-2.2.1.jar:na]
> at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) 
> [quartz-2.2.1.jar:na]
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2534dec3-7b8e-4f3c-bd12-09969773ed5a%40googlegroups.com.


[tesseract-ocr] Re: VietOCR 5.0 Java & .NET Releases

2019-07-18 Thread Quan Nguyen
VietOCR v5.5.0 & VietOCR.NET v5.5.0 Releases

A Java/.NET WPF GUI frontend for Tesseract OCR engine. The releases include 
the following improvements:

- Upgrade to Tesseract 4.1.0

http://vietocr.sf.net

>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e0e513b6-1068-450a-8ba3-e6e9fd0ca784%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: setting user-words in api?

2019-07-03 Thread Quan Nguyen
https://github.com/tesseract-ocr/tesseract/wiki/APIExample
https://github.com/tesseract-ocr/tesseract/issues/960

api->SetVariable("user_words_suffix", "user-words");


On Wednesday, July 3, 2019 at 10:29:59 AM UTC-5, Jochen Naumann wrote:
>
> Hi, I can set the user-words file on the command line with tesseract tool, 
> but how do I set this using the api? 
> I searched for it in the sourcecode but could not find it, woult 
> appreciate any help.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6d3c039b-1d58-427f-b53a-5ef8a3639c40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Percentage of accuracy

2019-06-29 Thread Quan Nguyen
Yes, its values range from 0 to 100.

On Saturday, June 29, 2019 at 12:00:45 PM UTC-5, Mox Betex wrote:
>
> I have found w_conf attribute in .hocr file.
> How should I interpret that value? Does high w_conf value means high 
> accuracy?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6343064a-1934-4fc3-ab1c-c69dbb37a1ae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Percentage of accuracy

2019-06-29 Thread Quan Nguyen
It's called "confidence" value in Tesseract terminology. hocr format output 
contains confidency values, at word level, I believe.

On Saturday, June 29, 2019 at 8:53:05 AM UTC-5, Mox Betex wrote:
>
> Is it possible to get percentage of accuracy of recognized text?
>
> I need to recognize multiple languages (2 languages) and tesseract doesn't 
> know exactly what language is when I put parametar -l lang1+lang2.
> What I want to do is to scan with both languages separately, but I would 
> need some percentage of accuracy to determine probability of language.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/85f970be-8bea-4c43-b3bc-0eb09534e9d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: table ocr with tesseract(tess4j)

2019-06-20 Thread Quan Nguyen
Included with tess4j are some utility methods such as Remove Lines. You can 
see demontration of the functions  with VietOCR, which uses the library.

https://sourceforge.net/projects/vietocr 


On Wednesday, June 19, 2019 at 7:40:36 AM UTC-5, Momene Vigal wrote:
>
> Hello, please im a beginner with tesseract actually using it with java
> please can anyone help me with how to do the ocr of  a table with 
> tesseract 
> in python or java
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4fce55a7-dcce-4f41-b42c-ef17baeca52d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: multi page outputs

2019-04-25 Thread Quan Nguyen
Try csplit.

https://stackoverflow.com/questions/11313852/split-one-file-into-multiple-files-based-on-delimiter

On Wednesday, April 17, 2019 at 7:55:34 AM UTC-5, kms...@gmail.com wrote:
>
> Hello, does anyone know if it is possible to take a multi page input (via 
> tif file or txt file with list of images) and output to separate text files 
> rather then one text file with page separators?
>
> Thank you.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/45b8eda6-82b7-48d4-8b8d-c1491695d0cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Improving accuracy on recognition Tesseract 4.3.1

2019-03-16 Thread Quan Nguyen
He was referring to the tess4j version.

On Friday, March 15, 2019 at 4:48:05 AM UTC-5, 易鑫 wrote:
>
> The latest Tesseract version is 4.0.0,how do you get the 4.3.1 version?
>
> Alberto Andreotti > 于2019年2月25日周一 
> 上午8:38写道:
>
>> Hello,
>>
>> You can try the OCR preprocessing  in spark NLP, if you are on Python or 
>> Scala.
>> Try to use the scaling option.
>>
>> Alberto.
>>
>> On Feb 24, 2019 2:21 PM, "'Nenad Kocev' via tesseract-ocr" <
>> tesser...@googlegroups.com > wrote:
>>
>>> Hello, I recently discovered Tesseract and I've been using it to extract 
>>> digits from images using tess4j library. With the settings posted bellow I 
>>> get around 85% accuracy of recognition.
>>> Is there a way to get 100% accuracy. I have example of an image in the 
>>> attachments. Other images may differ only in number of digits they have and 
>>> may also contain special characters like ",+-". Thanks for your help. 
>>>
>>> Settings:
>>>
>>> tesseract.setPageSegMode(7); // text is in single line
>>>
>>> tesseract.setTessVariable("tessedit_char_whitelist", ",+-0123456789");
>>> tesseract.setTessVariable("load_system_dawg ", "false");
>>> tesseract.setTessVariable("load_freq_dawg ", "false");
>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com .
>>> To post to this group, send email to tesser...@googlegroups.com 
>>> .
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/be275b8f-1c58-4793-b2c3-545bc2e5ac74%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/CA%2BGGe6QvDytiJ3nyo6kV%3DdihWrULzwNtvHO7uUogm6e80RMeRw%40mail.gmail.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b73ae7a7-7f76-4c37-a499-a50ac01b3614%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Improving accuracy on recognition Tesseract 4.3.1

2019-02-24 Thread Quan Nguyen
The whitelist feature is currently not working in Tesseract 4.0.0.

https://github.com/tesseract-ocr/tesseract/issues/751 

On Sunday, February 24, 2019 at 11:21:17 AM UTC-6, Nenad Kocev wrote:
>
> Hello, I recently discovered Tesseract and I've been using it to extract 
> digits from images using tess4j library. With the settings posted bellow I 
> get around 85% accuracy of recognition.
> Is there a way to get 100% accuracy. I have example of an image in the 
> attachments. Other images may differ only in number of digits they have and 
> may also contain special characters like ",+-". Thanks for your help. 
>
> Settings:
>
> tesseract.setPageSegMode(7); // text is in single line
>
> tesseract.setTessVariable("tessedit_char_whitelist", ",+-0123456789");
> tesseract.setTessVariable("load_system_dawg ", "false");
> tesseract.setTessVariable("load_freq_dawg ", "false");
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1a780372-e59b-4806-9f31-1f916c8e41b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Coordinates of the Text on the Mobile screen.

2019-02-06 Thread Quan Nguyen
There are several examples of getting word coordinates in Tess4J's unit 
tests.

On Tuesday, February 5, 2019 at 12:21:38 AM UTC-6, Rakesh Kumar wrote:
>
> Hi,
>
>  
>
>  
>
> Recently i have success using Tesseract-ocr in converting PNG file into 
> Text.
>
>  
>
> Scenario: I am taking screenshot(PNG) of the Mobile app and using 
> Tesseract for converting PNG file into Text. 
>
>  
>
> Question: When i convert PNG file into Text, can i also get 
> coordinates(X,Y)  of the certain text element on the mobile screen?
>
>  
>
> Example: Upon Conversion of PNG file into Text, text shows like this "Help 
> people interested in this repository understand your project by adding a 
> README."
>
>  
>
> In the above Example can i get coordinate(X,Y) of the Text element "
> *understand*"  ?
>
>  
>
> *This is my Project in git:*
>
>  
>
> https://github.com/rkandanuru/Tess4J.git
>
> Regards,
>
> Rakesh 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5e0890d0-91a8-42d3-94a0-cf279eadf7e9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: How to write .unicharambigs file?

2019-02-03 Thread Quan Nguyen
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.03%E2%80%933.05#the-unicharambigs-file

On Thursday, January 31, 2019 at 3:52:43 AM UTC-6, 易鑫 wrote:
>
> Hello,everyone:
>   
>   I have trained a new lstm model in my project,but the result is 
> not so good as I expected. I notice that some characters often mistake in 
> my result.
> I learned that add some rules in .unicharambigs can reduce the mistakes?
>
> I extract the eng.traineddata and get the eng.unicharambigs file as 
> follows,but I do not quite understand the meaning?
> can anyone help me?Thanks in advance.
>
>
>
> v1
> 2 ' ' 1 " 1
> 2 ` ' 1 " 1
> 2 ' ` 1 " 1
> 2 ‘ ' 1 " 1
> 2 ' ‘ 1 " 1
> 2 ’ ' 1 " 1
> 2 ' ’ 1 " 1
> 2 ` ` 1 " 1
> 2 ` ‘ 1 " 1
> 2 ‘ ` 1 " 1
> 2 ` ’ 1 " 1
> 2 ’ ` 1 " 1
> 2 ‘ ‘ 1 “ 1
> 2 ‘ ’ 1 " 1
> 2 ’ ‘ 1 " 1
> 2 ’ ’ 1 ” 1
> 2 , , 1 „ 1
> 1 m 2 r n 0
> 2 r n 1 m 0
> 1 m 2 i n 0
> 2 i n 1 m 0
> 1 d 2 c l 0
> 2 c l 1 d 0
> 2 n n 2 r m 0
> 2 r m 2 n n 0
> 1 n 2 r i 0
> 2 r i 1 n 0
> 2 l i 1 h 0
> 2 l r 1 h 0
> 2 i i 1 u 0
> 2 i i 1 n 0
> 2 n i 1 m 0
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1cf3bb61-af26-47d9-aad4-afee8c42aaa0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: tesseract 4.0 recognize '5' as 'S'

2019-02-03 Thread Quan Nguyen
PSM 6 or 7 produces the correct result.

On Saturday, February 2, 2019 at 4:24:31 AM UTC-6, Fan Zhou wrote:
>
> I use tesseract 4.0 to recognize the picture. The text on image is 
> "MR5458". But it's recognized as "MRS458"[image: 1631056773203_og.png]
>
>
> Then I adjust the image, move the text little lower, like this[image: 
> 1631056773203.png]
>
>
> or adjust the image size, crop the bottom, like this [image: 
> 1631056773203_7.png]
>
>
> then tesseract give the right text "MR5458".
>
>
> Should I adjust the text to the center or bottom the image , or is there 
> any other optons?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/804d4e65-3e7a-411a-8d33-394e703b6466%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: digits parameter ignored

2019-01-29 Thread Quan Nguyen
Not supported in 4.0.0 version.

https://github.com/tesseract-ocr/tesseract/issues/751

On Sunday, January 27, 2019 at 11:19:46 AM UTC-6, Sebastian Bürgel wrote:
>
> Hi, I'm new to tesseract and for starters experimenting via command line. 
> I applied manual thresholding in gimp before feeding through tesseract. The 
> attached image showing a number (26.7) in a bit fancy font (but IMO not too 
> crazy) is not recognized at all :-/
> After applying a gaussian blur (img5.png) I worked somewhat. Next I set 
> the digits property which seems to be ignored:
>
> $ tesseract img5.png stdout digits
> 26?
>
> Also the whitelist seems to be ignored:
>
> $ tesseract img5.png stdout -c tessedit_char_whitelist=0123456789
> 26?
>
> Any suggestion on how to force tesseract how to really only give me digits 
> or other suggestions for my task?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2e18bc73-2292-441f-a0f3-33d10f64aafd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: OCR: java.io.IOException: Cannot run program "cmd": error=2, No such file or directory

2018-09-21 Thread Quan Nguyen
cmd is Windows' command line interpreter. Does MacOS have that?

On Friday, September 21, 2018 at 1:10:15 PM UTC-5, Rakesh Kumar wrote:
>
>
>
>
> I have successfully installed tesseract in my MAC.
> Now trying to read and write image text using below code and found error. 
> I am missing some thing or doing some thing wrong. Can any one help me with 
> Java code to achieve this?
>
> Refer attachment or In this link you can see the project: 
>
>
> https://drive.google.com/file/d/13TF0ZISgbZkdFqOgkt6xBNbWk07CnK79/view?usp=sharing
>
>
>
> Error:
> java.io.IOException: Cannot run program "cmd": error=2, No such file or
>   directory
>
>  at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
>  at java.lang.Runtime.exec(Runtime.java:620)
>  at java.lang.Runtime.exec(Runtime.java:485)
>  at com.chillyfacts.com.my_main.main(my_main.java:13)
>  Caused by: java.io.IOException: error=2, No such file or directory
>  at java.lang.UNIXProcess.forkAndExec(Native Method)
>  at java.lang.UNIXProcess.(UNIXProcess.java:247)
>  at java.lang.ProcessImpl.start(ProcessImpl.java:134)
>  at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>  ... 3 more
>
>
> package com.chillyfacts.com;
> import java.io.PrintWriter;
> public class my_main {
> public static void main(String[] args) {
> String input_file="/usr/local/Cellar/tesseract/apps.png";
> String output_file="/usr/local/Cellar/"; 
> String tesseract_install_path="/usr/local/Cellar/tesseract/";
> String[] command = { "cmd" };
> Process p;
> try {
> p = Runtime.getRuntime().exec(command);
> new Thread(new SyncPipe(p.getErrorStream(), System.err)).start();
> new Thread(new SyncPipe(p.getInputStream(), 
> System.out)).start();
> PrintWriter stdin = new PrintWriter(p.getOutputStream());
> stdin.println("\""+tesseract_install_path+"\" 
> \""+input_file+"\" \""+output_file+"\" -l eng");
> stdin.close();
> p.waitFor();
> System.out.println();
> System.out.println();
> System.out.println();
> System.out.println();
> 
> System.out.println(Read_File.read_a_file(output_file+".txt"));
> } catch (Exception e) {
> e.printStackTrace();
> }
> } 
> }
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/fb96e3f6-0165-4bd0-a8b8-cf70a536f3ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tess4J: Invalid memory access

2018-09-03 Thread Quan Nguyen
The issue has been fixed in the latest releases published today.

Thanks.

On Sunday, September 2, 2018 at 11:50:53 AM UTC-5, Subramaniyan Suresh 
wrote:
>
> I am using Tess4J in my project to extract text from an image (Using 
> Eclipse IDE). I am getting the following error when I try run the OCR. Any 
> suggestion?  
>
> *Error: Exception in thread "main" java.lang.Error: Invalid memory access*
>
>
> *Note: I have attached the image file which I've used *
>
> *My Code*:
>
>
> package tesseractTraining;
>
>
> import java.io.File;
>
> import net.sourceforge.tess4j.*;
>
>
> public class TesseractMainRunner {
>
> public static void main(String[] args) {
>
> File imageFile = new File("E:\\Tesseract\\Test Images\\sample.png");
>
> Tesseract instance = new Tesseract();
>
> try {
>
> instance.setDatapath("C:\\Program Files (x86)\\Tesseract-OCR\\tessdata");
>
> instance.setLanguage("eng");
>
> String result = instance.doOCR(imageFile);
>
> System.out.println(result);
>
> } catch (TesseractException e) {
>
> System.err.println(e.getMessage());
>
> }
>
> imageFile.exists();
>
> }
>
>
> }
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a384bae2-9580-4066-a0f0-9d90eacc50fd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tess4J: Invalid memory access

2018-09-02 Thread Quan Nguyen
Subramaniyan,

If possible, please put in a new issue 
at https://github.com/nguyenq/tess4j/issues for tracking purpose.

Thanks.

On Sunday, September 2, 2018 at 1:52:14 PM UTC-5, Quan Nguyen wrote:
>
> I tested your sample image and confirmed the error. It looks like a bug in 
> the routine that determines the image's bit depth. A new version will be 
> released once a fix is worked out and committed.
>
> Thank you for reporting.
>
> On Sunday, September 2, 2018 at 11:50:53 AM UTC-5, Subramaniyan Suresh 
> wrote:
>>
>> I am using Tess4J in my project to extract text from an image (Using 
>> Eclipse IDE). I am getting the following error when I try run the OCR. Any 
>> suggestion?  
>>
>> *Error: Exception in thread "main" java.lang.Error: Invalid memory access*
>>
>>
>> *Note: I have attached the image file which I've used *
>>
>> *My Code*:
>>
>>
>> package tesseractTraining;
>>
>>
>> import java.io.File;
>>
>> import net.sourceforge.tess4j.*;
>>
>>
>> public class TesseractMainRunner {
>>
>> public static void main(String[] args) {
>>
>> File imageFile = new File("E:\\Tesseract\\Test Images\\sample.png");
>>
>> Tesseract instance = new Tesseract();
>>
>> try {
>>
>> instance.setDatapath("C:\\Program Files (x86)\\Tesseract-OCR\\tessdata");
>>
>> instance.setLanguage("eng");
>>
>> String result = instance.doOCR(imageFile);
>>
>> System.out.println(result);
>>
>> } catch (TesseractException e) {
>>
>> System.err.println(e.getMessage());
>>
>> }
>>
>> imageFile.exists();
>>
>> }
>>
>>
>> }
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1ba5ba4c-91d9-4138-ac80-0633f43c3eab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tess4J: Invalid memory access

2018-09-02 Thread Quan Nguyen
I tested your sample image and confirmed the error. It looks like a bug in 
the routine that determines the image's bit depth. A new version will be 
released once a fix is worked out and committed.

Thank you for reporting.

On Sunday, September 2, 2018 at 11:50:53 AM UTC-5, Subramaniyan Suresh 
wrote:
>
> I am using Tess4J in my project to extract text from an image (Using 
> Eclipse IDE). I am getting the following error when I try run the OCR. Any 
> suggestion?  
>
> *Error: Exception in thread "main" java.lang.Error: Invalid memory access*
>
>
> *Note: I have attached the image file which I've used *
>
> *My Code*:
>
>
> package tesseractTraining;
>
>
> import java.io.File;
>
> import net.sourceforge.tess4j.*;
>
>
> public class TesseractMainRunner {
>
> public static void main(String[] args) {
>
> File imageFile = new File("E:\\Tesseract\\Test Images\\sample.png");
>
> Tesseract instance = new Tesseract();
>
> try {
>
> instance.setDatapath("C:\\Program Files (x86)\\Tesseract-OCR\\tessdata");
>
> instance.setLanguage("eng");
>
> String result = instance.doOCR(imageFile);
>
> System.out.println(result);
>
> } catch (TesseractException e) {
>
> System.err.println(e.getMessage());
>
> }
>
> imageFile.exists();
>
> }
>
>
> }
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/539303c3-4f18-4b97-a7d3-c11cf6e8e6d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tesseract API to output PDF with txt layer

2018-07-27 Thread Quan Nguyen
Yes, the PDF functionality is exposed in C-API 
 
interface, which Tess4J  fully 
supports.

On Friday, July 27, 2018 at 4:46:15 AM UTC-5, PSK wrote:
>
> I know that Tesseract v4 CLI is able to produce the output as PDF with txt 
> layer. The question is whether this functionality is also available via its 
> API? 
> If so, the other question is whether Tess4J will expose that API to Java, 
> too (I know that this is a separate product, but maybe someone is familiar 
> with both products, otherwise I will go to Tess4J form to ask if such API 
> is planned to be exposed).
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/cb491fdc-bddf-4889-8c24-bf87a8e7fc2a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: java.lang.UnsatisfiedLinkError in tess4j

2018-07-18 Thread Quan Nguyen
I've found the unit tests completed successfully with JDK 10 as well. So 
make sure you install the appropriate VC++ runtime.

On Wednesday, July 18, 2018 at 6:48:35 AM UTC-5, Quan Nguyen wrote:
>
> That was old posts for older versions. You need VC++ 2015 at this time.
>
> Btw, we haven't tested it with Java 10 yet. Will do that soon.
>
> Thanks.
>
> On Tuesday, July 17, 2018 at 12:50:20 PM UTC-5, Dattatraya Tembare wrote:
>>
>> Forgot to mention, I have installed Visual C++ Redistributable for VS2013 
>> <http://www.microsoft.com/en-au/download/details.aspx?id=40784>
>> Still how to check it, whether I have installed correct version?
>>
>>
>> On Tuesday, July 17, 2018 at 1:49:09 PM UTC-4, Dattatraya Tembare wrote:
>>>
>>> I'm facing the same problem on Windows 10 with JDK 10 (same code is 
>>> working in Windows 7 with JDK 8)
>>> Error Logs: 
>>>
>>> java.lang.UnsatisfiedLinkError: The specified module could not be found.
>>>
>>>
>>> at com.sun.jna.Native.open(Native Method) ~[jna-4.5.2.jar!/:4.5.2 
>>> (b0)]
>>> at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:288) 
>>> ~[jna-4.5.2.jar!/:4.5.2 (b0)]
>>> at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:427) 
>>> ~[jna-4.5.2.jar!/:4.5.2 (b0)]
>>> at com.sun.jna.Library$Handler.(Library.java:179) 
>>> ~[jna-4.5.2.jar!/:4.5.2 (b0)]
>>> at com.sun.jna.Native.loadLibrary(Native.java:569) 
>>> ~[jna-4.5.2.jar!/:4.5.2 (b0)]
>>> at com.sun.jna.Native.loadLibrary(Native.java:544) 
>>> ~[jna-4.5.2.jar!/:4.5.2 (b0)]
>>> at 
>>> net.sourceforge.tess4j.util.LoadLibs.getTessAPIInstance(LoadLibs.java:81) 
>>> ~[tess4j-4.0.1.jar!/:4.0.1]
>>> at net.sourceforge.tess4j.TessAPI.(TessAPI.java:42) 
>>> ~[tess4j-4.0.1.jar!/:4.0.1]
>>> at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:364) 
>>> ~[tess4j-4.0.1.jar!/:4.0.1]
>>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:277) 
>>> ~[tess4j-4.0.1.jar!/:4.0.1]
>>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:260) 
>>> ~[tess4j-4.0.1.jar!/:4.0.1]
>>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:241) 
>>> ~[tess4j-4.0.1.jar!/:4.0.1]
>>> at 
>>> com.ea.ocr.tesseract.ReadImageText.ocrText(ReadImageText.java:143) 
>>> ~[classes!/:18.07.02]
>>> at 
>>> com.ea.ocr.data.GenerateData.getOcrText(GenerateData.java:663) 
>>> ~[classes!/:18.07.02]
>>> at 
>>> com.ea.ocr.data.GenerateData.voterCounts(GenerateData.java:299) 
>>> ~[classes!/:18.07.02]
>>> at 
>>> com.ea.ocr.data.GenerateData.generateJsonFile(GenerateData.java:123) 
>>> ~[classes!/:18.07.02]
>>> at 
>>> com.ea.ocr.controller.EaOcrController.generateJson(EaOcrController.java:35) 
>>> ~[classes!/:18.07.02]
>>> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.
>>> invoke0(Native Method) ~[na:na]
>>> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.
>>> invoke(Unknown Source) ~[na:na]
>>> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.
>>> invoke(Unknown Source) ~[na:na]
>>> at java.base/java.lang.reflect.Method.invoke(Unknown Source) ~[
>>> na:na]
>>>
>>>
>>>
>>> On Monday, April 27, 2015 at 7:46:40 PM UTC-4, Quan Nguyen wrote:
>>>>
>>>> Did you install Visual C++ Redistributable for VS2013 
>>>> <http://www.microsoft.com/en-au/download/details.aspx?id=40784>? This 
>>>> seems to be the most common problem. Check the forum 
>>>> <http://sourceforge.net/p/tess4j/discussion/> for answers.
>>>>
>>>> On Monday, April 27, 2015 at 9:29:28 AM UTC-5, Ankita Verma wrote:
>>>>>
>>>>> Hi, 
>>>>> I am getting an error
>>>>>
>>>>> Exception in thread "main" java.lang.UnsatisfiedLinkError: The 
>>>>> specified module could not be found.
>>>>>
>>>>> at com.sun.jna.Native.open(Native Method)
>>>>> at com.sun.jna.Native.open(Native.java:1759)
>>>>> at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:260)
>>>>> at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:398)
>>>>> at com.sun.jna.Library$Handler.(Library.java:147)
>>

[tesseract-ocr] Re: java.lang.UnsatisfiedLinkError in tess4j

2018-07-18 Thread Quan Nguyen
That was old posts for older versions. You need VC++ 2015 at this time.

Btw, we haven't tested it with Java 10 yet. Will do that soon.

Thanks.

On Tuesday, July 17, 2018 at 12:50:20 PM UTC-5, Dattatraya Tembare wrote:
>
> Forgot to mention, I have installed Visual C++ Redistributable for VS2013 
> <http://www.microsoft.com/en-au/download/details.aspx?id=40784>
> Still how to check it, whether I have installed correct version?
>
>
> On Tuesday, July 17, 2018 at 1:49:09 PM UTC-4, Dattatraya Tembare wrote:
>>
>> I'm facing the same problem on Windows 10 with JDK 10 (same code is 
>> working in Windows 7 with JDK 8)
>> Error Logs: 
>>
>> java.lang.UnsatisfiedLinkError: The specified module could not be found.
>>
>>
>> at com.sun.jna.Native.open(Native Method) ~[jna-4.5.2.jar!/:4.5.2 
>> (b0)]
>> at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:288) 
>> ~[jna-4.5.2.jar!/:4.5.2 (b0)]
>> at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:427) 
>> ~[jna-4.5.2.jar!/:4.5.2 (b0)]
>> at com.sun.jna.Library$Handler.(Library.java:179) 
>> ~[jna-4.5.2.jar!/:4.5.2 (b0)]
>> at com.sun.jna.Native.loadLibrary(Native.java:569) 
>> ~[jna-4.5.2.jar!/:4.5.2 (b0)]
>> at com.sun.jna.Native.loadLibrary(Native.java:544) 
>> ~[jna-4.5.2.jar!/:4.5.2 (b0)]
>> at 
>> net.sourceforge.tess4j.util.LoadLibs.getTessAPIInstance(LoadLibs.java:81) 
>> ~[tess4j-4.0.1.jar!/:4.0.1]
>> at net.sourceforge.tess4j.TessAPI.(TessAPI.java:42) 
>> ~[tess4j-4.0.1.jar!/:4.0.1]
>> at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:364) 
>> ~[tess4j-4.0.1.jar!/:4.0.1]
>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:277) 
>> ~[tess4j-4.0.1.jar!/:4.0.1]
>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:260) 
>> ~[tess4j-4.0.1.jar!/:4.0.1]
>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:241) 
>> ~[tess4j-4.0.1.jar!/:4.0.1]
>> at 
>> com.ea.ocr.tesseract.ReadImageText.ocrText(ReadImageText.java:143) 
>> ~[classes!/:18.07.02]
>> at com.ea.ocr.data.GenerateData.getOcrText(GenerateData.java:663) 
>> ~[classes!/:18.07.02]
>> at 
>> com.ea.ocr.data.GenerateData.voterCounts(GenerateData.java:299) 
>> ~[classes!/:18.07.02]
>> at 
>> com.ea.ocr.data.GenerateData.generateJsonFile(GenerateData.java:123) 
>> ~[classes!/:18.07.02]
>> at 
>> com.ea.ocr.controller.EaOcrController.generateJson(EaOcrController.java:35) 
>> ~[classes!/:18.07.02]
>> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.
>> invoke0(Native Method) ~[na:na]
>> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke
>> (Unknown Source) ~[na:na]
>> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.
>> invoke(Unknown Source) ~[na:na]
>> at java.base/java.lang.reflect.Method.invoke(Unknown Source) ~[na
>> :na]
>>
>>
>>
>> On Monday, April 27, 2015 at 7:46:40 PM UTC-4, Quan Nguyen wrote:
>>>
>>> Did you install Visual C++ Redistributable for VS2013 
>>> <http://www.microsoft.com/en-au/download/details.aspx?id=40784>? This 
>>> seems to be the most common problem. Check the forum 
>>> <http://sourceforge.net/p/tess4j/discussion/> for answers.
>>>
>>> On Monday, April 27, 2015 at 9:29:28 AM UTC-5, Ankita Verma wrote:
>>>>
>>>> Hi, 
>>>> I am getting an error
>>>>
>>>> Exception in thread "main" java.lang.UnsatisfiedLinkError: The 
>>>> specified module could not be found.
>>>>
>>>> at com.sun.jna.Native.open(Native Method)
>>>> at com.sun.jna.Native.open(Native.java:1759)
>>>> at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:260)
>>>> at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:398)
>>>> at com.sun.jna.Library$Handler.(Library.java:147)
>>>> at com.sun.jna.Native.loadLibrary(Native.java:412)
>>>> at com.sun.jna.Native.loadLibrary(Native.java:391)
>>>> at net.sourceforge.tess4j.util.LoadLibs.getTessAPIInstance(Unknown 
>>>> Source)
>>>> at net.sourceforge.tess4j.TessAPI.(Unknown Source)
>>>> at net.sourceforge.tess4j.Tesseract.init(Unknown Source)
>>>> at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
>>>> at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
>>>> at net.sourceforge.tess4j.Tesseract.

[tesseract-ocr] Re: Tesseract v3.05.02 Training Error During Processing

2018-07-02 Thread Quan Nguyen
Wrong filename format. The box should be named `eng.dmd.exp0.box`.

On Monday, July 2, 2018 at 7:40:26 AM UTC-5, James Lipham wrote:
>
> I have also updated the image to have everything as the same 
> font/size/etc, but still, tesseract just says "Error during processing." 
> with seemingly zero information as to why.
>
> Has anyone ever experienced this? If I can't find anything else out, I 
> guess I'll just have to step through the page processing code and add in a 
> bunch of printf statements just to see where tesseract is blowing up, which 
> seems a bit overkill.
>
> -- James
>
> On Sunday, July 1, 2018 at 3:13:27 PM UTC-5, James Lipham wrote:
>>
>> Good afternoon all!
>>
>> I'm running Tesseract v3.05.02 on OSX Sierra (installed via Homebrew), 
>> and I'm trying to train a custom dataset with some fairly small images that 
>> are programmatically generated from a dot matrix display.
>>
>> When running 
>> tesseract eng.dmd.exp0.tif eng.dmd.box nobatch box.train
>>
>> I get the following information:
>>
>> Tesseract Open Source OCR Engine v3.05.02 with Leptonica
>> Page 1
>> Detected 27 diacritics
>> Error during processing.
>>
>> There is no additional information output to the console, so I really 
>> don't know what my error could be. I've looked and verified that the tif 
>> image doesn't have an alpha channel, and the box file appears to be in the 
>> appropriate format.
>>
>> Has anyone run into this before? I'm thinking it's something absurdly 
>> simple. I've attached both the TIF and box files I'm using.
>>
>> Thank you very very much!
>>
>> -- James
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/496444a6-fc35-41b3-8ae6-cd17672573e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: German "Straße" is often "StraBe" (tesseract 4.0)

2018-05-27 Thread Quan Nguyen
Latin.traineddata 

 can 
be found under script folder.

https://github.com/tesseract-ocr/tessdata_fast

On Friday, May 25, 2018 at 5:02:29 AM UTC-5, Thomas Güttler wrote:
>
> Hi Shree,
>
> what do you mean with "script/Latin traineddata"? I am new to tesseract 
> and use version 4.0 via docker.
> Most internet pages are about tesseract 3.0.x. 
>
> I am unsure where to start.
>
> Maybe it is better to use 3.0.x?
>
> Regards,
>   Thomas
>
> Am Donnerstag, 24. Mai 2018 13:41:30 UTC+2 schrieb shree:
>>
>> Please try with script/Latin traineddata to see if you get better results.
>>
>> I have added your comment to issue at 
>> https://github.com/tesseract-ocr/langdata/pull/54
>>
>>
>>
>> On Thursday, May 24, 2018 at 5:05:55 PM UTC+5:30, Thomas Güttler wrote:
>>>
>>> I use tesseract 4.0 via docker (tesseractshadow/tesseract4re)
>>>
>>> Very often tesseract detects "StraBe" instead of "Straße".
>>>
>>> Yes, I use -l=deu
>>>
>>> The word "Straße" is very common in german. It means "street".
>>>
>>> Since "StraBe" makes no sense I would like to improve this.
>>>
>>> What do you suggest?
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/fa7f1004-df61-4bc8-b039-3ef39f64b909%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-05-27 Thread Quan Nguyen
You need a much larger sample, in the range of hundreds or at least several 
dozens, so that even though some symbols could experience "Couldn't find a 
matching blob" errors, other samples would get picked up.

On Saturday, May 26, 2018 at 1:52:39 AM UTC-5, Paul Kitchen wrote:
>
> I am creating training data for GD symbols using Tesseract 3.05.01. One 
> of my TIFF files I use for training is in the attached 
> gdt.symbols.exp10.tif. When I attempt to use this TIFF with the 
> corresponding gdt.symbols.exp10.box, I get this output:
>
> Tesseract Open Source OCR Engine v3.05.01 with Leptonica
> Page 1
> FAIL!
> APPLY_BOXES: boxfile line 7/Ⓜ ((1153,69),(1431,346)): FAILURE! Couldn't 
> find a matching blob
> FAIL!
> APPLY_BOXES: boxfile line 10/Ⓜ ((1993,69),(2268,346)): FAILURE! Couldn't 
> find a matching blob
> APPLY_BOXES:
>Boxes read from boxfile:  10
>Boxes failed resegmentation:   2
>Found 8 good blobs.
> Generated training data for 5 words
>
>
> Basically, both circled M symbols are failing.
>
> I've attached ImagesWithBoxes.PNG which is a screen capture from 
> jTessBoxEditor showing the TIFF image with boxes. As you can see, the boxes 
> appear to be correct.
>
> Why isn't tesseract able to use the circle M symbols for training? Can I 
> change the image of the symbols some how to help tesseract... maybe connect 
> the circle and M parts with a line?
>
> Thanks in advance.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/75aa477d-ec94-4c08-bb0e-10d6765a2798%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Error in executing new .traineddata file

2018-05-18 Thread Quan Nguyen
The error message indicated Tesseract was looking for osa.traineddata file 
under C:\Program Files (x86)\Tesseract-OCR folder. You need to correctly 
specify the path to tessdata folder. Your oem value seems to be incorrect 
too.

Run at the command prompt for full instructions:

tesseract.exe --help-extra

On Friday, May 18, 2018 at 4:52:57 AM UTC-5, Eman Sawalha wrote:
>
>
> <https://lh3.googleusercontent.com/-y4vXGW7lJQY/Wv6g7UuFyqI/Br0/OF3DzLn1jcsjRREY22PPnIiG55ZwpMfxACEwYBhgL/s1600/Capture.PNG>
>
>
> Thank you for your respond Quan Nguyen. I downloaded Tesseract Beta 4.00,  
> and do the same copy the .traineddata inside tessdata, then add the path of 
> Tesseract into system environment variable. And I got this new error :(. 
>
>
>
>
> On Wednesday, May 16, 2018 at 11:49:03 PM UTC+3, Quan Nguyen wrote:
>>
>> Sounds like you've trained using Tesseract 3.05, so it could run with 
>> Tesseract of that version or newer and is not backward compatible with 
>> older version 3.02.
>>
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e64ab9d5-b07a-49a3-b95e-b06515fafc72%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: How can JTessBoxEditor generate lstm files ?

2018-05-17 Thread Quan Nguyen
Those .sh shell scripts would not run on Windows environment. You may 
need Cygwin or Windows Subsystem for Linux. Hope others who have experience 
on this will chime in.

On Thursday, May 17, 2018 at 2:35:50 AM UTC-5, Fadi Fawzi wrote:
>
> Thanks  Quan 
> But is there a simple way to do training  process on WINDOWS, or I must 
> adhere to Linux (Ubuntu) ?
>
> On Tue, May 15, 2018 at 5:02 AM, Quan Nguyen <nguy...@gmail.com 
> > wrote:
>
>> As of today, it supports only legacy training (i.e., 3.0x version).
>>
>> Training for 4.0x is described in the Training Wiki 
>> <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00>.
>>
>>
>> On Saturday, May 12, 2018 at 6:40:27 AM UTC-5, fadif...@gmail.com wrote:
>>>
>>> I am trying to add a few new characters to the arabic character set and 
>>> train for them by fine tuning using JtessBoxEditor v2 beta.
>>>
>>> The box/tiff pairs are generated succesfully, but when I apply the 
>>> executable trainer a .tr file and ara.traineddata are generated instead of 
>>> .lstm file. According to docs, a lstm file should be generated in order to 
>>> start lstmtraining. Please, tell me where am I wrong?.
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/00cd6b54-3ed2-45e4-afbf-aa3c3f166e74%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/00cd6b54-3ed2-45e4-afbf-aa3c3f166e74%40googlegroups.com?utm_medium=email_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ae68213d-e22f-4f3f-a527-d30a2d792468%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Can Tesseract OCR shows all the possible answers (similarities) for an image?

2018-05-16 Thread Quan Nguyen
Using TessChoiceIterator API, maybe?

On Thursday, May 3, 2018 at 5:41:08 AM UTC-5, Eman Sawalha wrote:
>
> Can Tesseract OCR shows all the possible answers (similarities)  for an 
> image?, because I'm working in starting my own Chrome extension and I hope 
> to be able to show all the possible answers for the users of extension.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/791dda25-3faf-48f1-8474-4b8836042157%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Similar pictures, different results

2018-05-16 Thread Quan Nguyen
The margins around the text could make a difference in recognition results.

On Tuesday, May 15, 2018 at 11:26:19 PM UTC-5, yang3...@gmail.com wrote:
>
> There are two similar pictures, the difference between them is the white 
> edge size.
>
> One result is right(3.png) but the other is wrong(4.png).
>
> I don't know why, can you help me.
>
> I use the jTessBoxEditor to see the box. It shows that Tesseract has boxed 
> out the right part.
>
>
> 
>
>
> 
> Thank you!
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b2d1ebfd-5d8a-400e-8ab7-a36a9eaadce7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Error in executing new .traineddata file

2018-05-16 Thread Quan Nguyen
Sounds like you've trained using Tesseract 3.05, so it could run with 
Tesseract of that version or newer and is not backward compatible with 
older version 3.02.

On Tuesday, May 15, 2018 at 6:51:17 PM UTC-5, Eman Sawalha wrote:
>
> Hello 
>
> Recently, I  worked on training Tesseract to detect Old South Arabian 
> Script, and I produced the .traineddata file. So to test .traineddata file 
> I copied the file into the tessdata file inside the Tesseract. My problem 
> that whenever I tried to execute it on cmd.exe it gives me this error, but 
> when I try to test it using VietOCR.net it works perfectly. Does anyone 
> have an idea of why this happening? I need it to work on command to use it 
> in some Google Chrome Extension that I designed.
> I used Tesseract v3.02
> Thanks in Advance
>
>
>
> 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/23ac3474-75ad-4226-abc5-a1a6fce0192b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: How can JTessBoxEditor generate lstm files ?

2018-05-14 Thread Quan Nguyen
As of today, it supports only legacy training (i.e., 3.0x version).

Training for 4.0x is described in the Training Wiki 
.


On Saturday, May 12, 2018 at 6:40:27 AM UTC-5, fadif...@gmail.com wrote:
>
> I am trying to add a few new characters to the arabic character set and 
> train for them by fine tuning using JtessBoxEditor v2 beta.
>
> The box/tiff pairs are generated succesfully, but when I apply the 
> executable trainer a .tr file and ara.traineddata are generated instead of 
> .lstm file. According to docs, a lstm file should be generated in order to 
> start lstmtraining. Please, tell me where am I wrong?.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/00cd6b54-3ed2-45e4-afbf-aa3c3f166e74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] VietOCR 5.0 Java & .NET Releases

2018-05-04 Thread Quan Nguyen
VietOCR 5.0, Java & .NET GUI frontends for Tesseract 4.0.0-beta.1, is 
available for download. Any feedback is welcome. Thanks.

https://sourceforge.net/projects/vietocr/files/

Alternate sites:

https://github.com/nguyenq/VietOCR3/releases
https://github.com/nguyenq/VietOCRwpf/releases

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/040d178f-d60e-48db-afc2-dacbdc469a96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tesseract couldn't load any languages!

2018-05-04 Thread Quan Nguyen
You'll need to setDatapath to your tessdata directory so Tesseract can find 
the *.traineddata files

On Friday, May 4, 2018 at 1:38:16 PM UTC-5, Dattatraya Tembare wrote:
>
> Exception in thread "main" java.lang.Error: Invalid memory access
> at com.sun.jna.Native.invokePointer(Native Method)
> at com.sun.jna.Function.invokePointer(Function.java:490)
> at com.sun.jna.Function.invoke(Function.java:434)
> at com.sun.jna.Function.invoke(Function.java:354)
> at com.sun.jna.Library$Handler.invoke(Library.java:244)
> at com.sun.proxy.$Proxy0.TessBaseAPIGetUTF8Text(Unknown Source)
> at net.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:433)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:288)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:209)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:193)
> at com.ea.ocr.tesseract.ReadImageText.readText(ReadImageText.java:59)
> at com.ea.ocr.tesseract.ReadImageText.main(ReadImageText.java:32)
> Error opening data file ./eng.traineddata
> Please make sure the TESSDATA_PREFIX environment variable is set to your 
> "tessdata" directory.
> Failed loading language 'eng'
> Tesseract couldn't load any languages!
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2b8a1b17-2904-4def-97ae-625f3b3b88f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] VietOCR 5.0 alpha availability

2018-03-03 Thread Quan Nguyen
Alternate sites:

https://github.com/nguyenq/VietOCR3/releases
https://github.com/nguyenq/VietOCRwpf/releases
https://github.com/nguyenq/VietOCR3.NET/releases

On Wednesday, January 10, 2018 at 9:02:15 AM UTC-6, Quan Nguyen wrote:
>
> Just updated again to use Tesseract 4.00 fast data.
>
> On Monday, January 8, 2018 at 5:16:50 PM UTC-6, Quan Nguyen wrote:
>>
>> Just updated the alpha versions with latest Tesseract 4.00alpha 
>> executables.
>>
>> https://sourceforge.net/projects/vietocr/files/
>>
>> On Monday, April 3, 2017 at 6:26:37 AM UTC-5, shree wrote:
>>>
>>> You need to get vietocr 5.0 alpha for tesseract 4.0 alpha
>>>
>>> https://sourceforge.net/projects/vietocr/files/vietocr.net/5.0alpha/
>>>
>>> https://sourceforge.net/projects/vietocr/files/vietocr/5.0alpha/
>>>
>>> ShreeDevi
>>> 
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Mon, Apr 3, 2017 at 2:52 PM, El Fakir Zakaria <elfakir@gmail.com> 
>>> wrote:
>>>
>>>> this is using Tesseract 3.04 not 4.00alpha ?
>>>>
>>>> 2017-03-31 18:13 GMT+01:00 Quan Nguyen <nguy...@gmail.com>:
>>>>
>>>>> VietOCR 5.0 alpha, Java & .NET GUI frontend for Tesseract 4.00alpha, 
>>>>> is available for download. Any feedback is welcome. Thanks.
>>>>>
>>>>> https://sourceforge.net/projects/vietocr/files/
>>>>>
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/aa63499d-1375-4c08-bf1d-e87c00f9b8cd%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/aa63499d-1375-4c08-bf1d-e87c00f9b8cd%40googlegroups.com?utm_medium=email_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/CALjY3nP4%2BA68yvfyVXGdFQATTMkVc7BpQdk_5VBgKQDMte-vKw%40mail.gmail.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CALjY3nP4%2BA68yvfyVXGdFQATTMkVc7BpQdk_5VBgKQDMte-vKw%40mail.gmail.com?utm_medium=email_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/83feed0a-fc0a-443e-aaad-8d5b8f46185f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Hindi language version not working. VietOCR.NET-4.5_64

2018-03-03 Thread Quan Nguyen
SourceForge site has been down for the past few days. You can download the 
programs from GitHub site:

https://github.com/nguyenq/VietOCR3.NET/releases
https://github.com/nguyenq/VietOCRwpf/releases

On Thursday, March 1, 2018 at 5:31:28 AM UTC-6, shree wrote:
>
> Use latest version vietocr
>
> https://sourceforge.net/projects/vietocr/files/vietocr/5.0alpha/
>
> or
>
> https://sourceforge.net/projects/vietocr/files/vietocr.net/5.0alpha/
>
>
> Use vietocr to download the traineddata from 
> https://github.com/tesseract-ocr/tessdata_fast
>
> On Thursday, March 1, 2018 at 3:23:48 PM UTC+5:30, Sohan Shekhawat wrote:
>>
>> Hello,
>>
>> As per the document about "How to OCR Hindi text using VietOCR", I 
>> followed all the steps like :
>>
>> 1.  Added *hin.traineddata* file in C:\Program 
>> Files\VietOCR.NET\tessdata folder.
>> 2. added *hi_IN.dic* file in C:\Program Files\VietOCR.NET\dict folder
>>
>> After that, I see the Hindi language option in language drop down. but 
>> when I tried to convert a document in hindi  but its throwing this 
>> exception.
>>
>> {"Attempted to read or write protected memory. This is often an 
>> indication that other memory is corrupt."}
>>
>> An unhandled exception of type 'System.AccessViolationException' occurred 
>> in InteropRuntimeImplementer.TessApiSignaturesInstance
>>
>> Additional information: Attempted to read or write protected memory. This 
>> is often an indication that other memory is corrupt.
>>
>> Please help and suggest!
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e039407d-db91-4bca-ad47-2776791044f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] VietOCR 5.0 alpha availability

2018-01-10 Thread Quan Nguyen
Just updated again to use Tesseract 4.00 fast data.

On Monday, January 8, 2018 at 5:16:50 PM UTC-6, Quan Nguyen wrote:
>
> Just updated the alpha versions with latest Tesseract 4.00alpha 
> executables.
>
> https://sourceforge.net/projects/vietocr/files/
>
> On Monday, April 3, 2017 at 6:26:37 AM UTC-5, shree wrote:
>>
>> You need to get vietocr 5.0 alpha for tesseract 4.0 alpha
>>
>> https://sourceforge.net/projects/vietocr/files/vietocr.net/5.0alpha/
>>
>> https://sourceforge.net/projects/vietocr/files/vietocr/5.0alpha/
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Mon, Apr 3, 2017 at 2:52 PM, El Fakir Zakaria <elfakir@gmail.com> 
>> wrote:
>>
>>> this is using Tesseract 3.04 not 4.00alpha ?
>>>
>>> 2017-03-31 18:13 GMT+01:00 Quan Nguyen <nguy...@gmail.com>:
>>>
>>>> VietOCR 5.0 alpha, Java & .NET GUI frontend for Tesseract 4.00alpha, is 
>>>> available for download. Any feedback is welcome. Thanks.
>>>>
>>>> https://sourceforge.net/projects/vietocr/files/
>>>>
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/aa63499d-1375-4c08-bf1d-e87c00f9b8cd%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/aa63499d-1375-4c08-bf1d-e87c00f9b8cd%40googlegroups.com?utm_medium=email_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/CALjY3nP4%2BA68yvfyVXGdFQATTMkVc7BpQdk_5VBgKQDMte-vKw%40mail.gmail.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CALjY3nP4%2BA68yvfyVXGdFQATTMkVc7BpQdk_5VBgKQDMte-vKw%40mail.gmail.com?utm_medium=email_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6eb3a5e7-7b3c-4392-ba3f-820e878ce27b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] VietOCR 5.0 alpha availability

2018-01-08 Thread Quan Nguyen
Just updated the alpha versions with latest Tesseract 4.00alpha executables.

https://sourceforge.net/projects/vietocr/files/

On Monday, April 3, 2017 at 6:26:37 AM UTC-5, shree wrote:
>
> You need to get vietocr 5.0 alpha for tesseract 4.0 alpha
>
> https://sourceforge.net/projects/vietocr/files/vietocr.net/5.0alpha/
>
> https://sourceforge.net/projects/vietocr/files/vietocr/5.0alpha/
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Mon, Apr 3, 2017 at 2:52 PM, El Fakir Zakaria <elfakir@gmail.com 
> > wrote:
>
>> this is using Tesseract 3.04 not 4.00alpha ?
>>
>> 2017-03-31 18:13 GMT+01:00 Quan Nguyen <nguy...@gmail.com >:
>>
>>> VietOCR 5.0 alpha, Java & .NET GUI frontend for Tesseract 4.00alpha, is 
>>> available for download. Any feedback is welcome. Thanks.
>>>
>>> https://sourceforge.net/projects/vietocr/files/
>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com .
>>> To post to this group, send email to tesser...@googlegroups.com 
>>> .
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/aa63499d-1375-4c08-bf1d-e87c00f9b8cd%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/aa63499d-1375-4c08-bf1d-e87c00f9b8cd%40googlegroups.com?utm_medium=email_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/CALjY3nP4%2BA68yvfyVXGdFQATTMkVc7BpQdk_5VBgKQDMte-vKw%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/CALjY3nP4%2BA68yvfyVXGdFQATTMkVc7BpQdk_5VBgKQDMte-vKw%40mail.gmail.com?utm_medium=email_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/fcdfea9f-f3da-4352-93dc-413bd18ab43d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: how to use PDF as Input

2018-01-04 Thread Quan Nguyen
You can specify a .uzn file defining the zones.

https://groups.google.com/forum/#!topic/tesseract-ocr/M0o5az7Zoo8

On Thursday, January 4, 2018 at 7:37:48 AM UTC-6, Subhanshu Gupta wrote:
>
> Thanks Quan. One more thing, how can I use Tesseract to read a form having 
> different data fields like Name, Address, etc. and save the corresponding 
> data to somewhere else?
>
>
> On Thursday, January 4, 2018 at 6:51:48 AM UTC+5:30, Quan Nguyen wrote:
>>
>> Tesseract engine cannot read PDF. You'll have to convert them to suitable 
>> images (TIFF or PNG) first. There are many tools for that: ImageMagick, 
>> GhostScript, PDFBox, etc.
>>
>> On Wednesday, January 3, 2018 at 12:05:12 PM UTC-6, Subhanshu Gupta wrote:
>>>
>>> Dear All,
>>>
>>> I am new to Tesseract OCR and need to implement it to Read PDF Forms but 
>>> I am not able to find any good documentation for which method to use to 
>>> read PDF as well as for Character Segmentation.
>>> If any of you have any doc/manual relating on which method is used where 
>>> it will be really very helpful.
>>>
>>> Thanks. :)
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/acd40ce0-46d2-4442-9f83-16a895ac27c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: how to use PDF as Input

2018-01-03 Thread Quan Nguyen
Tesseract engine cannot read PDF. You'll have to convert them to suitable 
images (TIFF or PNG) first. There are many tools for that: ImageMagick, 
GhostScript, PDFBox, etc.

On Wednesday, January 3, 2018 at 12:05:12 PM UTC-6, Subhanshu Gupta wrote:
>
> Dear All,
>
> I am new to Tesseract OCR and need to implement it to Read PDF Forms but I 
> am not able to find any good documentation for which method to use to read 
> PDF as well as for Character Segmentation.
> If any of you have any doc/manual relating on which method is used where 
> it will be really very helpful.
>
> Thanks. :)
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f157b744-f25c-459f-ae5e-ebf429ae3ff3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Standalone tesseract 3.5 or higher with font detection for Windows

2017-12-06 Thread Quan Nguyen
VietOCR, Java version, bundles Tesseract Windows executable. You may want 
to check it out.

https://sourceforge.net/projects/vietocr/files/vietocr/

On Wednesday, December 6, 2017 at 11:30:50 AM UTC-6, Amir Vahid wrote:
>
> Either would be helpful. My real issue is finding a standalone portable 
> tesseract.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/41d3905d-f619-4888-8020-98691552ab91%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Funny results with vowels in Portuguese for Tesseract 4.0alpha

2017-11-29 Thread Quan Nguyen
Did you try the latest .traineddata versions -- fast or best?

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files

On Friday, November 17, 2017 at 1:49:04 PM UTC-6, Marcello Galvão wrote:
>
> Hi, i have de same problem..
> Did you have any solution?
> Thank you!!
>
> Em quarta-feira, 16 de agosto de 2017 14:10:49 UTC-3, Paulo Scardine 
> escreveu:
>>
>> I have the following image:
>>
>>
>> 
>> For version 3.04 I get the correct result: "Declaração de Nascido Vivo".
>>
>> For 4.0 I get "Declªrªç㺠de Nªscidº Vivº".
>>
>> What I have tried so far:
>>
>>- everything on the Improving the Quality 
>> wiki 
>>article
>>- messing with `tessedit_char_whitelist` and `tessedit_char_blacklist`
>>- custom user word and pattern files
>>
>> Nothing made difference, I starting to think this may be a bug.
>>
>> I would appreciate advice on how to improve the diagnostic.
>>
>> Thanks in advance,
>> --
>> Paulo
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7f011e85-29b0-4b38-8013-2d1592c2155c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tesseract on Windows 10

2017-11-29 Thread Quan Nguyen
https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM#400-alpha-for-windows

On Wednesday, November 29, 2017 at 1:20:37 AM UTC-6, fernandov...@gmail.com 
wrote:
>
> Trying to run tesseract on  Windows 10 but the installation for Windows 
> does not provide an .exe file.  Running from the command line does not 
> work.  Where can I get a windows 10 installation?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4deda43d-207c-4fd9-8ed3-55b3b2f3c2c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tesseract max pages while ocring?

2017-11-15 Thread Quan Nguyen
Try the latest version, 3.4.2.

On Wednesday, November 15, 2017 at 1:43:08 AM UTC-6, Nikolai Velkov wrote:
>
> So is there a fix for that ?
>
> On Monday, November 13, 2017 at 4:47:56 PM UTC+2, Quan Nguyen wrote:
>>
>> The GhostScript-based PDF module in Tess4J sets the limit to 999 since it 
>> was thought that the users would never attempt to go beyond that since 
>> loading only a few hundreds of 300-DPI full-size image pages into memory 
>> would already cause out-of-memory exceptions.
>>
>> On Friday, November 10, 2017 at 6:47:31 AM UTC-6, Nikolai Velkov wrote:
>>>
>>> We're using tesseract 3.0.5 to ocr pdf files and when ocring a pdf file 
>>> with 1000+ pages, tesseract goes to page 999 and then stops ocring. No 
>>> error or anything (using it with java and tess4j btw). It's also not about 
>>> the size since i tested it with a pdf file of 1000+ pages with only the 
>>> letter 'A' on each page. The file is about 2.3 mbs. Is there any 
>>> configuration that specifies a max amount of pages to ocr ?
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2af8c3cd-ca59-499b-a31c-84e7d513a9fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tesseract max pages while ocring?

2017-11-13 Thread Quan Nguyen
The GhostScript-based PDF module in Tess4J sets the limit to 999 since it 
was thought that the users would never attempt to go beyond that since 
loading only a few hundreds of 300-DPI full-size image pages into memory 
would already cause out-of-memory exceptions.

On Friday, November 10, 2017 at 6:47:31 AM UTC-6, Nikolai Velkov wrote:
>
> We're using tesseract 3.0.5 to ocr pdf files and when ocring a pdf file 
> with 1000+ pages, tesseract goes to page 999 and then stops ocring. No 
> error or anything (using it with java and tess4j btw). It's also not about 
> the size since i tested it with a pdf file of 1000+ pages with only the 
> letter 'A' on each page. The file is about 2.3 mbs. Is there any 
> configuration that specifies a max amount of pages to ocr ?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/24491262-aeac-4954-9c20-7a38afc9470e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Tesseract ignores tessedit_char_whitelist parameter

2017-10-19 Thread Quan Nguyen
https://github.com/tesseract-ocr/tesseract/issues/751

Use current version 3.05.x, if possible.


On Thursday, October 19, 2017 at 9:19:08 AM UTC-5, Ľuboš Katrinec wrote:
>
> I used --print-parameters with this version and I could see the parameter 
> in the list included. Do you think it is not used even if listed? It's the 
> same with tessedit_char_blacklist? Is there an alternative?
>
> Thanks and regards,
> Lubos
>
> On Saturday, October 14, 2017 at 5:43:16 PM UTC+2, shree wrote:
>>
>> whitelist parameter does not work with tesseract 4.0x
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Sat, Oct 14, 2017 at 8:25 PM, Dan9er  wrote:
>>
>>> -c goes at the very end of the command, and you can combine those two 
>>> arguments. Try this:
>>>
>>> > tesseract threshold_problem1.jpeg stdout -c tessedit_char_whitelist=
>>> ABCDEFGHIJKLMNOPQRSTUVWXYZ tessedit_char_blacklist=abcdef
>>> ghijklmnopqrstuvwxyz
>>>
>>> On Friday, October 13, 2017 at 5:43:46 AM UTC-4, Ľuboš Katrinec wrote:

 Hello,

 I'm trying to solve captcha images just for fun (or rather a challenge 
 ;-) ). I'm passing tessedit_char_whitelist and tessedit_char_blacklist 
 parameters but somehow they seem to be ignored. Perhaps I just miss 
 something.

 > tesseract -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ -c 
 tessedit_char_blacklist=abcdefghijklmnopqrstuvwxyz  
 threshold_problem1.jpeg 
 stdout
 Warning. Invalid resolution 0 dpi. Using 70 instead.
 R x C Eo e

 I'm using a windows version:

 > tesseract -v
 tesseract 4.00.00alpha
  leptonica-1.74.1
   libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : 
 libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0


 I'm doing it over a JPEG, could that be a problem?

 Thanks and regards,
 Lubos

>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/7036c184-2d91-43f1-874f-44f2c29f3d61%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b091515d-b04b-46bb-93c0-5e908c52d326%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: In Spanish language, character ‘o’ is recognized incorrectly as some round symbol

2017-09-26 Thread Quan Nguyen
The Wiki page offers more info:

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#updated-data-files-for-version-400-september-15-2017

On Sunday, September 24, 2017 at 9:56:29 AM UTC-5, Quan Nguyen wrote:
>
> It depends on your needs. There are also fast traineddata:
>
> https://github.com/tesseract-ocr/tessdata_fast
>
> It looks that many languages are represented.
>
> On Saturday, September 23, 2017 at 12:38:46 PM UTC-5, Subrato Namata wrote:
>>
>> Thanks Quan Nguyen. My initial results show that the issue is gone. Let 
>> me try with few more samples.
>> Additionally, are these the best trained data of tesseract available for 
>> all the other languages and we must be using these only ?
>>
>>
>>
>> On Saturday, 23 September 2017 00:02:51 UTC+5:30, Quan Nguyen wrote:
>>>
>>> Try best traineddata:
>>>
>>> https://github.com/tesseract-ocr/tessdata_best
>>>
>>> On Friday, September 22, 2017 at 2:24:08 AM UTC-5, Subrato Namata wrote:
>>>>
>>>> Environment
>>>>
>>>> Windows Setup: tesseract-ocr-setup-4.0.0-alpha.20170804.exe
>>>> Spanish Trained Data: 
>>>> https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata
>>>> Command Used to OCR:
>>>> tesseract.exe ImageDoc.png output --oem 1 -l spa
>>>> Where ImageDoc.png is a Spanish Scanned Document
>>>> output is the text file output of OCRed text
>>>>
>>>>- Tesseract Version: 4.0
>>>>- Platform: Windows version 64 Bit
>>>>
>>>> Current Behavior:
>>>>
>>>> In Spanish, character ‘o’ is recognized incorrectly as some round 
>>>> symbol. Attached input file is ImageDoc.png and Error screenshot
>>>>
>>>> [image: spanish] 
>>>> <https://user-images.githubusercontent.com/12831051/30733359-45541566-9f94-11e7-8bb1-e8027c2efc0e.png>
>>>> [image: imagedoc] 
>>>> <https://user-images.githubusercontent.com/12831051/30733369-4d785ab8-9f94-11e7-9ff4-7f594f72a8dc.png>
>>>>
>>>>
>>>>
>>>>
>>>> Expected Behavior:
>>>>
>>>> Character ‘o’ should be recognized correctly.
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/43f20f10-35c3-49dd-9319-22267d0d857d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: In Spanish language, character ‘o’ is recognized incorrectly as some round symbol

2017-09-24 Thread Quan Nguyen
It depends on your needs. There are also fast traineddata:

https://github.com/tesseract-ocr/tessdata_fast

It looks that many languages are represented.

On Saturday, September 23, 2017 at 12:38:46 PM UTC-5, Subrato Namata wrote:
>
> Thanks Quan Nguyen. My initial results show that the issue is gone. Let me 
> try with few more samples.
> Additionally, are these the best trained data of tesseract available for 
> all the other languages and we must be using these only ?
>
>
>
> On Saturday, 23 September 2017 00:02:51 UTC+5:30, Quan Nguyen wrote:
>>
>> Try best traineddata:
>>
>> https://github.com/tesseract-ocr/tessdata_best
>>
>> On Friday, September 22, 2017 at 2:24:08 AM UTC-5, Subrato Namata wrote:
>>>
>>> Environment
>>>
>>> Windows Setup: tesseract-ocr-setup-4.0.0-alpha.20170804.exe
>>> Spanish Trained Data: 
>>> https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata
>>> Command Used to OCR:
>>> tesseract.exe ImageDoc.png output --oem 1 -l spa
>>> Where ImageDoc.png is a Spanish Scanned Document
>>> output is the text file output of OCRed text
>>>
>>>- Tesseract Version: 4.0
>>>- Platform: Windows version 64 Bit
>>>
>>> Current Behavior:
>>>
>>> In Spanish, character ‘o’ is recognized incorrectly as some round 
>>> symbol. Attached input file is ImageDoc.png and Error screenshot
>>>
>>> [image: spanish] 
>>> <https://user-images.githubusercontent.com/12831051/30733359-45541566-9f94-11e7-8bb1-e8027c2efc0e.png>
>>> [image: imagedoc] 
>>> <https://user-images.githubusercontent.com/12831051/30733369-4d785ab8-9f94-11e7-9ff4-7f594f72a8dc.png>
>>>
>>>
>>>
>>>
>>> Expected Behavior:
>>>
>>> Character ‘o’ should be recognized correctly.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e662287d-7e0e-4e2a-b776-8c75057b5bdc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: In Spanish language, character ‘o’ is recognized incorrectly as some round symbol

2017-09-22 Thread Quan Nguyen
Try best traineddata:

https://github.com/tesseract-ocr/tessdata_best

On Friday, September 22, 2017 at 2:24:08 AM UTC-5, Subrato Namata wrote:
>
> Environment
>
> Windows Setup: tesseract-ocr-setup-4.0.0-alpha.20170804.exe
> Spanish Trained Data: 
> https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata
> Command Used to OCR:
> tesseract.exe ImageDoc.png output --oem 1 -l spa
> Where ImageDoc.png is a Spanish Scanned Document
> output is the text file output of OCRed text
>
>- Tesseract Version: 4.0
>- Platform: Windows version 64 Bit
>
> Current Behavior:
>
> In Spanish, character ‘o’ is recognized incorrectly as some round symbol. 
> Attached input file is ImageDoc.png and Error screenshot
>
> [image: spanish] 
> 
> [image: imagedoc] 
> 
>
>
>
>
> Expected Behavior:
>
> Character ‘o’ should be recognized correctly.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0c091ffa-923c-4f48-b273-6d93751c8b82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: How to get bounding boxes for text in images.

2017-09-19 Thread Quan Nguyen
Have you seen the API Examples?

https://github.com/tesseract-ocr/tesseract/wiki/APIExample

On Tuesday, September 19, 2017 at 10:27:47 AM UTC-5, Somesh Kumar wrote:
>
> Hi
>
> I have used tesseract-ocr in my application and I am able to get the 
> result(i.e extracting the text from image in form of multiple strings) 
> using the wrapper 'pytesseract'.
> For a requirement in the application, I need to draw bounding boxes around 
> each word so that I could get the co-ordinates of each word or is there any 
> possibility that i can get coordinates for the text in the image.
>
> It could be a great help if any one can suggest me the correct way.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/edf21b2a-8674-4b9b-970a-e6c2a4f9b3c8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tesseract 4 can not load new osd.traineddata

2017-09-15 Thread Quan Nguyen
I've experienced the same issue with best osd.traineddata ever since it was 
published when testing Tess4J for Tesseract 4.0.0alpha.

On Friday, September 15, 2017 at 1:04:27 AM UTC-5, 
enkhbaata...@unimedia.co.jp wrote:
>
>
> 
>
>
> 
> I've downloaded and compiled latest tesseract source file from git 
> repository but when i run tesseract engine with mode psm 0 or 1 it can't 
> load osd.traineddata. My osd.traineddata file is from best directory. I 
> tried with older osd.traineddata file from tessdata directory it works 
> fine. Does anyone experience with this error and know fix for this problem? 
> OS: Ubuntu 16.04.02 LTS 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2c17fbb3-5091-4ec8-8bfb-7bcfb828793c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Where can I download the binary?

2017-09-07 Thread Quan Nguyen
That was based on a very old version of Tesseract. Have you considered one 
that is more current?

https://github.com/charlesw/tesseract

On Thursday, September 7, 2017 at 3:42:24 AM UTC-5, Patrick LE GALL wrote:
>
> Hi everyone,
>
> I've tryed to download the binary from the page 
> *"http://www.pixel-technology.com/freeware/tessnet2/ 
> "* but the link to 
> *"Download 
> binary here "* is not working
> It seems that the url *"http://tmp.m4f.eu/tessnet2.zip 
> "* is not good anymore.
> Do you know where I can find it?
> Regards
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8ec96da0-11ad-422f-b196-9db116cd5e31%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Win64 build fails on VS2017

2017-07-26 Thread Quan Nguyen
I think it needs at the top of the file:

#include 

On Wednesday, July 26, 2017 at 10:50:35 AM UTC-5, THintz wrote:
>
> The following line generates an error:
>
> max_offset = std::max(max_offset, (*code)(i)-han_offset);
>
>
>
> Severity Code Description Project File Line Suppression State
> Error C2039 'max': is not a member of 'std' libtesseract 
> \tesseract\ccutil\unicharcompress.cpp 208 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/844fa1e4-55aa-420b-a630-1d5e7589016a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: underlined text problem - tess4j

2017-07-22 Thread Quan Nguyen
Pix LeptUtils.removeLines(Pix pixs) 

On Thursday, July 20, 2017 at 8:47:25 AM UTC-5, iShahad thobaiti wrote:
>
> Hello, 
>
> I'm trying to extract text from pdf file and it contain underlined text 
> that the OCR cannot recognize accurately, It either skip the text or 
> wrongly recognize it.
>
> What is the best way to overcome the issue ? 
>
>
> Thanks 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/44c4fbf4-2a46-4118-b185-6ac13cc96022%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: text close to lines

2017-07-10 Thread Quan Nguyen
You can call Lept4J's *LeptUtils* 

.*removeLines* 

(*Pix* 

 pixs).

http://tess4j.sourceforge.net/docs/index.html


On Monday, July 10, 2017 at 3:24:12 AM UTC-5, GuillaumeQ wrote:

> I have in a document some text written in a table. the lines of the table 
> are pretty close to the text. when i doOCR, i dont get the text between the 
> lines. is there any way to improve this performance and read some text 
> close to lines? the image is attached
>
> my code:
>
> def ocrToStream(){
> def imageFile = new File("path\\to.PNG")
>  ITesseract instance = new Tesseract1() // JNA Direct Mapping
> instance.setDatapath("") // replace  with 
> path to parent directory of tessdata
> instance.setLanguage("fra")
>
> try {
> def result = instance.doOCR(imageFile)
> System.out.println(result)
> } catch (TesseractException e) {
> System.err.println(e.getMessage())
> } catch (IOException e) {
> System.err.println(e.getMessage())
> }
> }
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d040f7a8-9cd6-4830-b29c-7175e3be58e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tesseract on Bitmap images giving error - Error: "Failed to create pix, this normally occurs because...

2017-06-12 Thread Quan Nguyen
Leptonica provides many different methods for creating Pix object. You can 
read from file, memory buffer, etc. So you may need to write your bitmap to 
such intermediate formats and read back as Pix.

pixRead
pixReadMem
pixReadMemPng

Check its API doc:

https://github.com/DanBloomberg/leptonica/blob/master/src/allheaders.h


On Thursday, June 8, 2017 at 8:55:04 AM UTC-5, Hari.K wrote:
>
> Hi There,
>
> I sometimes receive an error - "Failed to create pix, this normally 
> occurs because the requested image size is too large, please check Standard 
> Error Output" when doing OCR on a bitmap image.
>
>
> Below highlighted line is where it's breaking for me - 
>
>  Bitmap bitmap;
> Spire.Pdf.PdfDocument document = new Spire.Pdf.PdfDocument(pdfPath);
>
>
> for (int i = 0; i <= document.Pages.Count; i++)
> {
> bitmap = (Bitmap)document.SaveAsImage(i, 
> PdfImageType.Bitmap, 200, 200); // where 200 is the DPI which I am 
> setting for a bitmap image
> ...
> .
>
> }
>
> More details on what I am trying to do here:
> 1) Uploaded a PDF document which is of hardly 600KB
> 2) Iterate through each PDF page and convert it into a BitMap image
> 3) Then input this BitMap image to Tesseract for performing OCR
>
> Please note, I don't get this error often. Any ideas on why this error as 
> I do not receive this every time ?
>
> Looking forward for some inputs on this..
>
> Thanks in Advance,
> Hari
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1c25db3e-3217-4bfd-9db8-3fce7e863045%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: use jTesseractEdit training but box edit is empty

2017-06-07 Thread Quan Nguyen
I don't see any box file, but from the appearance of the image, Tesseract 
probably had problems recognizing it, therefore, producing an empty box 
file. You'll need to perform some image processing first to make the image 
more amenable to Tesseract.

On Tuesday, June 6, 2017 at 9:44:58 PM UTC-5, Shaw Ryan wrote:
>
> Thank you 
> I have uploaded box and tiff
> Please help
> 在 2017年6月5日星期一 UTC+8下午6:27:14,Shaw Ryan写道:
>>
>>
>> 
>> How can I edit the data?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/08f51a21-b26d-4c33-98ed-f6fc6336a934%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: use jTesseractEdit training but box edit is empty

2017-06-06 Thread Quan Nguyen
You may want to attach your TIFF/Box pair here so people can look and help.

On Monday, June 5, 2017 at 8:58:19 PM UTC-5, Shaw Ryan wrote:
>
> I have created a box file
>
> 在 2017年6月5日星期一 UTC+8下午11:24:36,Quan Nguyen写道:
>>
>> You'd need to provide the box file also. If you do not have one, you can 
>> create the box file using the options provided in the other tabs.
>>
>> On Monday, June 5, 2017 at 5:27:14 AM UTC-5, Wang Ryan wrote:
>>>
>>>
>>> <https://lh3.googleusercontent.com/-yCj0vqGtWmo/WTUawRK7xkI/ACk/EiGACLCD-G0TQwn7pJc5On1-fYZjLPIfwCLcB/s1600/20170605164727.jpg>
>>> How can I edit the data?
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4e976268-1f68-474b-80af-b9bf873cf49c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: use jTesseractEdit training but box edit is empty

2017-06-05 Thread Quan Nguyen
You'd need to provide the box file also. If you do not have one, you can 
create the box file using the options provided in the other tabs.

On Monday, June 5, 2017 at 5:27:14 AM UTC-5, Wang Ryan wrote:
>
>
> 
> How can I edit the data?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9e1a6dd5-273e-4a80-b00b-99d94aff1858%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: How to improve accuracy for OCR?

2017-05-15 Thread Quan Nguyen
It was said created for 3.04. Can you try it with Tesseract 3.04?

On Monday, May 15, 2017 at 5:31:52 AM UTC-5, Jasnan Tp wrote:
>
> hi,
>
> When I use mrz.traineddata, I get the following error
>
> tesseract test.png result.txt -l mrz
> Tesseract Open Source OCR Engine v3.03 with Leptonica
> index >= 0 && index < size_used_:Error:Assert failed:in file 
> ../ccutil/genericvector.h, line 589
> [1]15786 segmentation fault (core dumped)  tesseract test.png 
> result.txt -l mrz
>
>
> Is this because mrz.traineddata is corrupted?
>
> On Tuesday, 7 March 2017 20:12:30 UTC+5:30, Zsombor Kaló wrote:
>>
>> To whom it may concern,
>>
>> Just created a tesseract 3.04 trained data (see attached). 
>> I call it "MRZ" because it has the OCR-B as the only font and trained 
>> with characters A-Z, 0-9 and the lesser-than symbol (<). Seems to be fast 
>> and accurate in my projects.
>>
>> On Monday, June 24, 2013 at 6:58:47 PM UTC+2, Nick White wrote:
>>>
>>> Hi Peter, 
>>>
>>> Sorry for the lack of response, I think us regulars here are all 
>>> quite busy at the moment. 
>>>
>>> Have you searched the archives of this mailing list? I seem to 
>>> recall someone previously deciding to go with a different project 
>>> which focused just on MRZ recognition. 
>>>
>>> Tesseract will do a reasonable job, as you have found, but perhaps 
>>> a dedicated program could do even better (and for less effort on 
>>> your part). 
>>>
>>> As far as improving your Tesseract results, though, I'd recommend 
>>> looking into user_patterns. It isn't well documented, but if the 
>>> format you're expecting is predictable it should help. Also have you 
>>> set up a unicharambigs file? That may help a little too (not much, 
>>> but it's probably worth adding for the common cases of 5 -> S, 8 -> 
>>> B, etc). 
>>>
>>> > One more unrelated question. How to read data from image with 
>>> non-standard 
>>> > orientation 
>>> > (upside down, rotated left/right by 90 degrees)? How to use OSD 
>>> feature? 
>>>
>>> I confess I don't actually know. I think Tesseract might try to 
>>> guess this entirely by itself. Does anyone else here know any 
>>> better? 
>>>
>>> Once you're happy your MRZ training is as good as it will get, would 
>>> you be happy to have it added to the main Tesseract repository? If 
>>> so (and it'd be great if you were) open an issue on the bug tracker 
>>> with the training file, and add some comments to the top of 
>>> mrz.config about how it was created and where the source files for 
>>> it are (see my grc.traineddata for an example). 
>>>
>>> Thanks Peter, and sorry again for not getting back to you sooner, 
>>>
>>> Nick 
>>>
>>> P.S. One other thing I just thought of: is the DPI you're feeding 
>>> into Tesseract the same as the DPI you trained with (300)? Ideally 
>>> it should be. Also you're right to preprocess using thresholding; 
>>> Tesseract isn't particularly good at that step and it's much better 
>>> if you can do it first. 
>>>
>>> On Wed, Jun 19, 2013 at 11:45:10PM -0700, Peter wrote: 
>>> > Hello. 
>>> > 
>>> > I'm trying to train Tesseract for OCR. My goal is to be able to 
>>> recognize text 
>>> > from MRZ zone of various documents (mainly national ID). The training 
>>> process 
>>> > should be pretty straightforward and I'd expect good results since all 
>>> I have 
>>> > to deal with is one font (OCR-B), capital letters of Latin alphabet 
>>> (A-Z), 
>>> > digits 0-9 and "less than" sign (<). Unfortunately the results are 
>>> worse than 
>>> > expected. While the effects for preprocessed images (thresholding 
>>> using GIMP) 
>>> > are pretty good (not perfect - in many cases Tesseract treats 0 as O, 
>>> sometimes 
>>> > treats 5 as S and occasionally inserts unexpected whitespace between 
>>> letters), 
>>> > data taken directly from an unprocessed scanned image is rather poorly 
>>> > recognized. There are many cases where Tesseract thinks O is 0, 5 is 
>>> S, 8 is S, 
>>> > 2 is Z, H is M, U is W etc. While 0 vs O case can be tough (OCR-B 0 
>>> and O don't 
>>> > look too different) and perhaps beyond Tesseract capabilities, I think 
>>> other 
>>> > ambiguities can and should be eliminated. As I'm new to Tesseract 
>>> (have been 
>>> > using it for just a few days now) I hope you can suggest me the 
>>> optimal 
>>> > training for OCR-B font or even provide me with some good training 
>>> sample. 
>>> > Here's what I did to train Tesseract: 
>>> > 
>>> > 1) Prepared training text with OCR-B font (train1.odt, see 
>>> attachments), 
>>> > converted it to .pdf with LibreOffice Writer (train1.pdf, see 
>>> attachments) 
>>> > 2) Opened train1.pdf in GIMP and saved it as 300 dpi tif (resolution: 
>>> 2479 x 
>>> > 3508), can't attach it as its size is more than 30 MB 
>>> > 3) Prepared font_properties file with the following line: ocrb 0 0 1 0 
>>> 0 
>>> > 4) Executed the following Tesseract commands: 
>>> > 
>>> > tesseract mrz.ocrb.exp0.tif mrz.ocrb.exp0 batch.nochop makebox 
>>> (corrected 

[tesseract-ocr] Re: tesseract multiply .tiffiles to singular .pdf file

2017-04-18 Thread Quan Nguyen
Either merge them into a single multi-page TIFF before sending to Tesseract 
or merge the output PDFs into one.

On Tuesday, April 18, 2017 at 11:58:50 AM UTC-5, Juan Lopez wrote:
>
> hi
> How to make a single pdf from multiples tif
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a85fa4a5-09ee-4c43-8052-95b3a45e749f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: trying to ocr with my new trained data,got an error:read_params_file parameter not found, no paramter indicated

2017-04-14 Thread Quan Nguyen
I guess you don't have your test.traineddata file in tesseract-ocr\tessdata 
folder.

On Friday, April 14, 2017 at 4:45:39 AM UTC-5, brada...@gmail.com wrote:
>
> I got my trainned data file, and every step is file, nothing is wrong. 
> when I try to OCR with my new trained data, I got an error:read_params_file 
> parameter not found: 
>
> that's all, no parameter indicated.
>
> Can anyone help me, please.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f97d39b2-9b50-4ec9-9de3-319657b19e2b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tesseract (4 alpha ) Amibiguos Situation while Correcting Chars in box file

2017-04-10 Thread Quan Nguyen
For Case 1, you'll need to merge the two boxes. For Case 2, you'll correct 
by splitting the box.

On Wednesday, April 5, 2017 at 12:55:37 AM UTC-5, srn...@gmail.com wrote:
>
> I am trying to correct box files, so i can train tesseract.
>
> But I have got strange problem, 
>
>
> 1) Tesseract is recognizing some alphabet as two letters, then how to edit 
> the box file then.. (screenshot 1).
> 2) Tesseract is not recognizing some alphabets so how to edit the box file 
> then.. (screenshot 2).
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b25c10ce-4132-40f0-bf2e-65b936b04235%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: How to correct characters for Arabic Language in Tesseract 4.0 LSTM?

2017-04-10 Thread Quan Nguyen
Once you correct the box file using the editor, you'll then have to 
manually execute the commands and/or scripts for training as depicted 
in https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 .

On Sunday, April 9, 2017 at 1:18:07 PM UTC-5, Ahmad Moawad wrote:
>
> Hello All,
>
> I use Tesseract 4.0 and the result for Arabic language is greater than 
> previous version of Tesseract, but there are some errors related to some 
> characters. My question is how to correct these characters, Should I use 
> jTessBoxEditor 2.0 beta for that or not. Because i tried to correct some 
> characters using jTessBoxEditor and copy the result to Tesseract directory.
> Unfortunately no progress and Tesseract doesn't recognize the characters.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/815d841b-7949-414b-96b4-6c77979e396d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-04-10 Thread Quan Nguyen
jTessBoxEditor 2.0 beta versions bundle the latest Tesseract 4.00alpha 
training executable. The training process for 4.00, however, has not been 
integrated to the program. The 3.0x training process is still supported.

Check out the two videos that depict the 3.0x training process:

https://wn.com/training_tesseract_ocr_for_arabic_language_tutorial

On Saturday, April 8, 2017 at 3:52:25 AM UTC-5, Ahmad Moawad wrote:
>
> Hello All,
>
>
> I want to make training for Arabic language in Tesseract 4.0, and The 
> result of this version is great but still need some tunning, so I got 
> jTessBoxEditor 2.0 beta.
> I tried to modify the incorrect characters and build ara.traineddata. 
> After copying the ara.traineddata to 
> /usr/share/tesseract-ocr/4.00/tessdata, I got random characters when I run 
> the tesseract on the image.
> So any suggestion of how making training for Version 4.0, I already know 
> that that last version 3.0x cube doesn't included in 4.0 LSTM or waiting 
> until Ray makes another updated ara.traineddata.
>
> ,Thanks.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1ec9e2e5-0e12-46da-987e-a4458f376f8d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] VietOCR 5.0 alpha availability

2017-03-31 Thread Quan Nguyen
VietOCR 5.0 alpha, Java & .NET GUI frontend for Tesseract 4.00alpha, is 
available for download. Any feedback is welcome. Thanks.

https://sourceforge.net/projects/vietocr/files/


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/aa63499d-1375-4c08-bf1d-e87c00f9b8cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Blacklist and whitelist

2017-03-08 Thread Quan Nguyen
Issue 751  has been 
reported.

On Wednesday, March 8, 2017 at 1:50:46 AM UTC-6, Hieu Nguyen wrote:
>
> Same issue here, does anyone figure out how to solve this?
>
> On Tuesday, February 28, 2017 at 1:36:10 AM UTC+7, Alex Grishin wrote:
>>
>> Good day!
>> I tried to use blacklist and whitelist abilities but I found that they 
>> do not work in Tesseract 4. Although the variables are initialized 
>> correctly the program still does not work properly.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4c725b47-25f0-492c-adf5-3f579a430111%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: using tesseract 4 on a C# project

2017-03-08 Thread Quan Nguyen
It seems more complicated than initially thought. I think it's best to ask 
tesseract.net developer to update the project.

On Monday, March 6, 2017 at 11:07:09 AM UTC-6, El Fakir Zakaria wrote:
>
> Tried to follow git's wiki and used this command : 
> cppan --build pvt.cppan.demo.google.tesseract.tesseract-master
> got 17854 warning and 0 Error, a tesseract solution shortcut was created 
> containing 5 projects:
> ->ALL_BUILD
> ->ZERO_CHECK
> ->build-dependencies
> ->copy-dependencies
> ->cppan-d-b-d
> and a bin folder with some .dll .lib and tesseract.exe
> not sure what to do next, your help is appreciated.
>
> Le dimanche 5 mars 2017 17:29:04 UTC, Quan Nguyen a écrit :
>>
>> You can try download the source and compile. Then 
>> replace libtesseract304.dll with libtesseract400.dll, which you generate 
>> from Tesseract 4.00alpha source, and try compile again. Make sure to run 
>> the unit tests to ensure the library will work as expected.
>>
>> On Sunday, March 5, 2017 at 10:09:37 AM UTC-6, El Fakir Zakaria wrote:
>>>
>>> is it too complicated to do ?
>>> thanks for your fast reply
>>>
>>> Le dimanche 5 mars 2017 15:59:28 UTC, Quan Nguyen a écrit :
>>>>
>>>> You may want to submit a request to that project's owner.
>>>>
>>>> https://github.com/charlesw/tesseract
>>>>
>>>> On Sunday, March 5, 2017 at 6:05:53 AM UTC-6, El Fakir Zakaria wrote:
>>>>>
>>>>> Can someone tell me the steps to use the last ver of tesseract on a C# 
>>>>> project, i managed to use tesseract 3.2 using NuGet package, thank you 
>>>>> for 
>>>>> your time
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/364984a5-cf8b-44ac-b3d1-b3b1d368780f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: using tesseract 4 on a C# project

2017-03-05 Thread Quan Nguyen
You can try download the source and compile. Then 
replace libtesseract304.dll with libtesseract400.dll, which you generate 
from Tesseract 4.00alpha source, and try compile again. Make sure to run 
the unit tests to ensure the library will work as expected.

On Sunday, March 5, 2017 at 10:09:37 AM UTC-6, El Fakir Zakaria wrote:
>
> is it too complicated to do ?
> thanks for your fast reply
>
> Le dimanche 5 mars 2017 15:59:28 UTC, Quan Nguyen a écrit :
>>
>> You may want to submit a request to that project's owner.
>>
>> https://github.com/charlesw/tesseract
>>
>> On Sunday, March 5, 2017 at 6:05:53 AM UTC-6, El Fakir Zakaria wrote:
>>>
>>> Can someone tell me the steps to use the last ver of tesseract on a C# 
>>> project, i managed to use tesseract 3.2 using NuGet package, thank you for 
>>> your time
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6c452d3b-a4b4-4dfa-9b2d-6c0f68c6e405%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: using tesseract 4 on a C# project

2017-03-05 Thread Quan Nguyen
You may want to submit a request to that project's owner.

https://github.com/charlesw/tesseract

On Sunday, March 5, 2017 at 6:05:53 AM UTC-6, El Fakir Zakaria wrote:
>
> Can someone tell me the steps to use the last ver of tesseract on a C# 
> project, i managed to use tesseract 3.2 using NuGet package, thank you for 
> your time
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/90051e7e-84bc-4feb-827d-e090e5de2275%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Simple Tesseract OCR in .NET 4+?

2017-02-28 Thread Quan Nguyen
Check out .NET wrapper for Tesseract:

https://github.com/charlesw/tesseract

On Tuesday, February 28, 2017 at 1:37:47 AM UTC-6, Cetor Notorious wrote:
>
> Hi everybody,
>
> I was wondering if anyone had a tutorial / example code that is really 
> simple.
> It just needs to recognize text from a webimage, and return the recognized 
> text.
>
> I would like to make it where I can have this entire piece in one DLL so 
> it's easy to use.
>
> Is anyone able to help me?
>
> Have a wonderful day :)
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/01457812-b969-405d-8c41-65422e1e945a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Blacklist and whitelist

2017-02-27 Thread Quan Nguyen
I observed the same thing in my recent tests of Tess 4.00alpha.

On Monday, February 27, 2017 at 12:36:10 PM UTC-6, Alex Grishin wrote:
>
> Good day!
> I tried to use blacklist and whitelist abilities but I found that they do 
> not work in Tesseract 4. Although the variables are initialized correctly the 
> program still does not work properly.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b7a694dd-3ff3-42e8-b39c-9d2f8a1ad2d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


  1   2   3   4   5   >