Re: [tesseract-ocr] How can I know whichever file format types Tesseract will recognize and able to process them ?

2018-04-18 Thread ShreeDevi Kumar
It depends on which image libraries leptonica was built with. tesseract -v will show the list On Thu 19 Apr, 2018, 10:46 AM abdu, wrote: > How do we get information for the file types in that Tesseract would > capable of processing ? > > -- > You received this message

[tesseract-ocr] How can I know whichever file format types Tesseract will recognize and able to process them ?

2018-04-18 Thread abdu
How do we get information for the file types in that Tesseract would capable of processing ? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to

[tesseract-ocr] Re: Page Separator

2018-04-18 Thread Ewan Mellor
If you use the hOCR output, it puts the image name in the tag at the top of the output. The plain text doesn't have any way to include the title in the output as far as I can see; you'll need to modify Tesseract to do that. Check out TessHOcrRenderer::BeginDocumentHandler. You'll want to

[tesseract-ocr] Re: Strategy for Sparse Text

2018-04-18 Thread CK
> > I trained a single font using http://ocr7.com/trainFont. I provided the > ttf file. The accuracy is about 10% better but the speed is considerably > faster. I believe the speed difference is due to the fact that the > .traineddata file (called by -l) is only a fraction of the

[tesseract-ocr] Page Separator

2018-04-18 Thread CK
Hello, >From the command line I call a list.txt for my input images. I would like to be able know which image output I am observing in the output text. I wonder if it is possible to use the input file name instead of simple text for the page_separator? include_page_breaks=1 -c

Re: [tesseract-ocr] install tesseract-4.00.00alpha error

2018-04-18 Thread Zdenko Podobny
You can start with using the latest version and providing details... Zdenko 2018-04-18 7:56 GMT+02:00 Kai Feng : > ./.libs/libtesseract.so: undefined reference to `omp_get_thread_num' > ./.libs/libtesseract.so: undefined reference to `GOMP_sections_end_nowait' >

[tesseract-ocr] install tesseract-4.00.00alpha error

2018-04-18 Thread Kai Feng
./.libs/libtesseract.so: undefined reference to `omp_get_thread_num' ./.libs/libtesseract.so: undefined reference to `GOMP_sections_end_nowait' ./.libs/libtesseract.so: undefined reference to `omp_get_num_threads' ./.libs/libtesseract.so: undefined reference to `GOMP_parallel_start'