[tesseract-ocr] Re: Tess4j failing near load of shared library tesseract-ocr-5.2 in Java 11 and 17, succeeds in Java 8

2022-09-28 Thread Quan Nguyen
PDF files are read by PDFBox library. You may want to look into that area 
as well.

On Wednesday, September 28, 2022 at 10:52:15 PM UTC-5 Quan Nguyen wrote:

> The source of tess4j is available; you can trace through the code to see 
> what threw the exception.
>
> Nevertheless, "throwable while reading PDF" seems to point to the part of 
> code that reads in PDF file. Was that something you wrote, or from tess4j 
> itself?
>
> On Sunday, September 25, 2022 at 11:02:35 AM UTC-5 rcja...@gmail.com 
> wrote:
>
>> I'm using Tess4j in a Java program to access Tesseract and read  PDFs 
>> read with PDFBox. I've been using Java 8, and things are running. The 
>> program is not commercial; I provide it to non-profits doing pro bono legal 
>> work in my state. In java 8 using the command line and eclipse, the program 
>> runs fine; running from the command line in either Java 11 or Java 17 
>> causes an error at the point where the program calls Tesseract.doOCR().
>>
>> I've dumped class loading information and see that last class loaded 
>> before the fatal exception is com.sun.jna.Platform; it would be used, for 
>> instance, to determine the platform on which the program is running. I 
>> haven't been able to find the source for the 5.2 version I downloaded from 
>> UB Mannheim, that would be useful since the stack trace has line numbers.
>>
>> The following is a snippet showing log messages, System.out.println 
>> messages, stacktraces, and class loading messages near the point of failure:
>>
>> pdfRenderer created buffered Image
>> set a couple of tesseract vars
>> [14.960s][info][class,load] net.sourceforge.tess4j.util.ImageIOHelper 
>> source: rsrc:tess4j-5.4.0.jar
>> [14.961s][info][class,load] javax.imageio.IIOParam source: 
>> jrt:/java.desktop
>> [14.961s][info][class,load] javax.imageio.ImageWriteParam source: 
>> jrt:/java.desktop
>> [14.962s][info][class,load] 
>> com.github.jaiimageio.plugins.tiff.TIFFImageWriteParam source: 
>> rsrc:jai-imageio-core-1.4.0.jar
>> [14.963s][info][class,load] javax.imageio.IIOImage source: 
>> jrt:/java.desktop
>> [14.964s][info][class,load] com.sun.jna.Library source: 
>> rsrc:jna-5.12.1.jar
>> [14.965s][info][class,load] net.sourceforge.tess4j.ITessAPI source: 
>> rsrc:tess4j-5.4.0.jar
>> [14.965s][info][class,load] net.sourceforge.tess4j.TessAPI source: 
>> rsrc:tess4j-5.4.0.jar
>> [14.966s][info][class,load] net.sourceforge.tess4j.util.LoadLibs source: 
>> rsrc:tess4j-5.4.0.jar
>> [14.969s][info][class,load] com.sun.jna.Platform source: 
>> rsrc:jna-5.12.1.jar
>> [14.973s][info][class,load] java.lang.ExceptionInInitializerError source: 
>> jrt:/java.base
>> throwable while reading PDF
>> [14.973s][info][class,load] java.lang.Throwable$PrintStreamOrWriter 
>> source: jrt:/java.base
>> [14.974s][info][class,load] java.lang.Throwable$WrappedPrintStream 
>> source: jrt:/java.base
>> java.lang.ExceptionInInitializerError
>> at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:442)
>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:326)
>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:309)
>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:290)
>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:274)
>> at 
>> drivingrecordtool.file.DrivingRecordPDFTextReader.getOCRText(DrivingRecordPDFTextReader.java:152)
>> at 
>> drivingrecordtool.file.DrivingRecordPDFTextReader.getText(DrivingRecordPDFTextReader.java:46)
>> at 
>> drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:78)
>> at 
>> drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:1)
>> at 
>> java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304)
>> at 
>> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>> at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343)
>> at 
>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>> at 
>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>> at java.base/java.lang.Thread.run(Thread.java:834)
>> Caused by: java.lang.IllegalStateException: zip file closed
>> at java.base/java.util.zip.ZipFile.ensureOpen(ZipFile.java:913)
>> at java.base/java.util.zip.ZipFile.getEntry(ZipFile.java:348)
>>
>> If I uninstall Java and install Java 8, the program works fine.
>>
>> If I uninstall Java and install Java 11 or Java 17, it fails in this 
>> fashion.
>>
>> Can anyone help me understand what the difference might be between the 
>> versions of Java so I can fix this?
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this 

[tesseract-ocr] Re: Tess4j failing near load of shared library tesseract-ocr-5.2 in Java 11 and 17, succeeds in Java 8

2022-09-28 Thread Quan Nguyen
The source of tess4j is available; you can trace through the code to see 
what threw the exception.

Nevertheless, "throwable while reading PDF" seems to point to the part of 
code that reads in PDF file. Was that something you wrote, or from tess4j 
itself?

On Sunday, September 25, 2022 at 11:02:35 AM UTC-5 rcja...@gmail.com wrote:

> I'm using Tess4j in a Java program to access Tesseract and read  PDFs read 
> with PDFBox. I've been using Java 8, and things are running. The program is 
> not commercial; I provide it to non-profits doing pro bono legal work in my 
> state. In java 8 using the command line and eclipse, the program runs fine; 
> running from the command line in either Java 11 or Java 17 causes an error 
> at the point where the program calls Tesseract.doOCR().
>
> I've dumped class loading information and see that last class loaded 
> before the fatal exception is com.sun.jna.Platform; it would be used, for 
> instance, to determine the platform on which the program is running. I 
> haven't been able to find the source for the 5.2 version I downloaded from 
> UB Mannheim, that would be useful since the stack trace has line numbers.
>
> The following is a snippet showing log messages, System.out.println 
> messages, stacktraces, and class loading messages near the point of failure:
>
> pdfRenderer created buffered Image
> set a couple of tesseract vars
> [14.960s][info][class,load] net.sourceforge.tess4j.util.ImageIOHelper 
> source: rsrc:tess4j-5.4.0.jar
> [14.961s][info][class,load] javax.imageio.IIOParam source: 
> jrt:/java.desktop
> [14.961s][info][class,load] javax.imageio.ImageWriteParam source: 
> jrt:/java.desktop
> [14.962s][info][class,load] 
> com.github.jaiimageio.plugins.tiff.TIFFImageWriteParam source: 
> rsrc:jai-imageio-core-1.4.0.jar
> [14.963s][info][class,load] javax.imageio.IIOImage source: 
> jrt:/java.desktop
> [14.964s][info][class,load] com.sun.jna.Library source: rsrc:jna-5.12.1.jar
> [14.965s][info][class,load] net.sourceforge.tess4j.ITessAPI source: 
> rsrc:tess4j-5.4.0.jar
> [14.965s][info][class,load] net.sourceforge.tess4j.TessAPI source: 
> rsrc:tess4j-5.4.0.jar
> [14.966s][info][class,load] net.sourceforge.tess4j.util.LoadLibs source: 
> rsrc:tess4j-5.4.0.jar
> [14.969s][info][class,load] com.sun.jna.Platform source: 
> rsrc:jna-5.12.1.jar
> [14.973s][info][class,load] java.lang.ExceptionInInitializerError source: 
> jrt:/java.base
> throwable while reading PDF
> [14.973s][info][class,load] java.lang.Throwable$PrintStreamOrWriter 
> source: jrt:/java.base
> [14.974s][info][class,load] java.lang.Throwable$WrappedPrintStream source: 
> jrt:/java.base
> java.lang.ExceptionInInitializerError
> at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:442)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:326)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:309)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:290)
> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:274)
> at 
> drivingrecordtool.file.DrivingRecordPDFTextReader.getOCRText(DrivingRecordPDFTextReader.java:152)
> at 
> drivingrecordtool.file.DrivingRecordPDFTextReader.getText(DrivingRecordPDFTextReader.java:46)
> at 
> drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:78)
> at 
> drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:1)
> at 
> java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304)
> at 
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.IllegalStateException: zip file closed
> at java.base/java.util.zip.ZipFile.ensureOpen(ZipFile.java:913)
> at java.base/java.util.zip.ZipFile.getEntry(ZipFile.java:348)
>
> If I uninstall Java and install Java 8, the program works fine.
>
> If I uninstall Java and install Java 11 or Java 17, it fails in this 
> fashion.
>
> Can anyone help me understand what the difference might be between the 
> versions of Java so I can fix this?
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f7309e7d-79bf-4594-a581-a1ce6a556b62n%40googlegroups.com.


Re: [tesseract-ocr] Re: ErrorInInitializerError - zip file closed out of tess4j.util.LoadLibs.getTesseractLibName

2022-09-28 Thread Quan Nguyen
You try the suggested program to see if the same exception occurs. If it 
does not, then it's possible that something in your code is not right. 
VietOCR is open source; you can browse through the code to see how it works.

Another suggestion: try to use the alternate Tesseract1 API.

On Friday, September 23, 2022 at 6:50:19 AM UTC-5 rcja...@gmail.com wrote:

> How is this supposed to help me? I have a program using the Tesseract 
> library to do OCR and then process the resulting text; I don't need a GUI 
> front end.
>
> On Thu, Sep 22, 2022 at 10:09 PM Quan Nguyen  wrote:
>
>> You may want to try VietOCR, a Java desktop app that uses Tess4J. It 
>> works with Java 8, 18, and probably 11 too.
>>
>> http://vietocr.sf.net
>>
>> On Thursday, September 22, 2022 at 7:10:54 AM UTC-5 rcja...@gmail.com 
>> wrote:
>>
>>> I am running a Java desktop application on Windows 10 Pro using Tess4j; 
>>> it was working fine with Java 1.8, am now trying to get it to run with Java 
>>> 11 (11.0.15.1). When it calls 
>>> net.sourceforge.tess4j.Tesseract.doOCR(BufferedImage 
>>> bi), it gets a Throwable ExceptionInInitializerError, with a 'cause' 
>>> exception indicating a call to 
>>> net.sourceforge.tess4j.util.LoadLibs.getTesseractLibName().
>>>
>>> Oddly enough, the application works fine from within eclipse, but fails 
>>> running on its own (for instance, running "java -jar program.jar" from a 
>>> cmd window). As I said, it was running with Java 1.8, and I might be able 
>>> to get it running again by uninstalling Java and reinstalling that JRE, but 
>>> there are other people that are going to use the application and it is not 
>>> reasonable to expect them to use a Java version that is that old. 
>>>
>>> I had the Maven dependency:
>>>
>>> net.sourceforge.tess4j
>>> tess4j
>>> 4.4.1
>>>
>>> The only copy of tess4j-4.1.1.jar on my hard drive is in my 
>>> .m2\repository\net\sourceforge\tess4j\tess4j\4.4.1. I have twice deleted it 
>>> and updated my maven configuration to restore it, to insure it is not 
>>> corrupted.
>>>
>>> I noticed that this is not the most recent Tess4j, so updated the 
>>> dependency to 5.4.0 and rebuilt. This has the same results: works in 
>>> eclipse, not from command line.
>>>
>>> The ExceptionInInitializerError is caused by an IllegalStateException: 
>>> zip file closed error. Documentation indicates this could be a corrupted 
>>> jar file or something similar. I don't know what jar file could be involved 
>>> except the tess4j, and I've replaced that twice as I've said.
>>>
>>> I've also tried this with two versions of the Tesseract-OCR dll -- 
>>> 5.0.1.20220118 and v5.2.0.20220712; they were both installed for "anyone on 
>>> this computer" with default options. They give the same results.
>>>
>>> Can someone help me figure out what's wrong?
>>>
>>> Stack trace:
>>>
>>> java.lang.ExceptionInInitializerError
>>> at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:427)
>>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:311)
>>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:294)
>>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:275)
>>> at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:259)
>>> at 
>>> drivingrecordtool.file.DrivingRecordPDFTextReader.getOCRText(DrivingRecordPDFTextReader.java:152)
>>> at 
>>> drivingrecordtool.file.DrivingRecordPDFTextReader.getText(DrivingRecordPDFTextReader.java:46)
>>> at 
>>> drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:78)
>>> at 
>>> drivingrecordtool.file.DrivingRecordFileReader.doInBackground(DrivingRecordFileReader.java:1)
>>> at 
>>> java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304)
>>> at 
>>> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>>> at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343)
>>> at 
>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>> at 
>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>> at java.base/java.lang.Thread.run(Thread.java:834)
>>> Caused by: java.lang.IllegalStateException: zip file closed
>>> at java.base/java.util.zip.ZipFile.ensureOpen(ZipFile.java:913)
>>> at java.base/java.util.zip.ZipFile.getEntry(ZipFile.java:348)
>>> at java.base/java.util.zip.ZipFile$1.getEntry(ZipFile.java:1130)
>>> at java.base/java.util.jar.JarFile.getEntry0(JarFile.java:586)
>>> at java.base/java.util.jar.JarFile.getEntry(JarFile.java:516)
>>> at 
>>> java.base/sun.net.www.protocol.jar.URLJarFile.getEntry(URLJarFile.java:131)
>>> at java.base/java.util.jar.JarFile.getJarEntry(JarFile.java:478)
>>> at 
>>> java.base/jdk.internal.loader.URLClassPath$JarLoader.getResource(URLClassPath.java:945)
>>>  

[tesseract-ocr] Re: how to combine (merge) trained data of Tesseract files

2022-09-28 Thread Quan Nguyen
Merging two traindeddata files is not possible nor supported. What you can 
do is rename your custom language pack to, say, eng1.traindeddata and then 
specify -l eng+eng1 when running the tesseract executable.

On Sunday, September 25, 2022 at 4:22:45 PM UTC-5 fishmo...@gmail.com wrote:

>
> 
>
> I have trained new font for english language with Tesseract OCR 
> (JavaTessBoxEditor) I received: eng.trainddata inttemp normproto pffmtable 
> shapetable unicharset files
>
> I've tried to make combine_data, but nothing changed to main 
> eng.traindeddata file. It was not not amended.
>
> Then I've tried to download 3rd Party Software (QT version from Zdenko, 
> GUI from General Delopment NL) No result still
>
> I ve searched all forums and youtube vids, no a clue how to combine main 
> eng file and new eng trained file.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/96d2fc15-9edd-4ab3-8d93-c81149374645n%40googlegroups.com.