Hello, 

I'm trying to parallelize the ocr proccess since I have alot of pdf 
documents and I try to use the sample code as a guide :

https://sourceforge.net/p/tess4j/discussion/1202293/thread/4562eccb/

it works for png images but not with pdf files?

is it possiable to parallize ocr for pdf?

the error i'm having : 
A fatal error has been detected by the Java Runtime Environment:SIGSEGV 
(0xb) at pc=0x0000000124482bc9, pid=36535, tid=0x0000000000001c03JRE 
version: Java(TM) SE Runtime Environment (8.0_121-b13) (build 
1.8.0_121-b13)Java 
VM: Java HotSpot(TM) 64-Bit Server VM (25.121-b13 mixed mode bsd-amd64 
compressed oops)Problematic frame:C  [libgs.dylib+0x3afbc9]
 copy_error_string+0xdFailed to write core dump. Core dumps have been 
disabled. To enable core dumping, try "ulimit -c unlimited" before starting 
Java againAn error report file with more information is saved as:
/Users/iShahad/NetBeansProjects/OCR/hs_err_pid36535.logIf you would like to 
submit a bug report, please visit:
http://bugreport.java.com/bugreport/crash.jspThe crash happened outside the 
Java Virtual Machine in native code.See problematic frame for where to 
report the bug.

 

I noticed that if the number of pdf files = number of threads it get 
proccessed with no errors.
but when I add more files I get the error :| 

*one solution* is to convert all the pdfs to png images then parallize over 
them.
I don't want to do that its not a practical solution. 

I want to understand why is it not parallizing the pdf files as the png 
images? 
is there a way to overcome it? other than converting pdf to png :( 


 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f5d8537b-05ac-4d34-8ae9-171ffd27768b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to