Re: [tesseract-ocr] How to generate a searchable PDF using some images but executing OCR on their preprocessed version

2019-03-26 Thread Shree Devi Kumar
See
https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#integrate-original-image-file-and-detected-text-into-pdf

On Wed, Mar 27, 2019 at 5:04 AM Nico  wrote:

> Hi,
> I have a bunch of RGB images I need to OCR and put together in a
> searchable PDF. I noticed that if I preprocess the images, OCR quality
> improves dramatically, but those preprocessed images cannot be used the
> make the PDF.
> Is there a way to create the PDF using my original images but execute the
> OCR on the preprocessed version of them ?
> Thanks in advance
> Nico
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/a64a6524-1dcd-424e-a94d-ebf070e00ed7%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 


भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXPfowBfTaMLaunLCfqQd1rmVb10cuB-_A0Xy2p30%3DKgA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] How to generate a searchable PDF using some images but executing OCR on their preprocessed version

2019-03-26 Thread Nico
Hi,
I have a bunch of RGB images I need to OCR and put together in a searchable 
PDF. I noticed that if I preprocess the images, OCR quality improves 
dramatically, but those preprocessed images cannot be used the make the PDF.
Is there a way to create the PDF using my original images but execute the 
OCR on the preprocessed version of them ? 
Thanks in advance
Nico

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a64a6524-1dcd-424e-a94d-ebf070e00ed7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Announcement: introducing TesseractStudio.Net, a free Windows GUI for Tesseract 4.0

2019-03-26 Thread farhad khalafi
We have released version 1.4 of Tesseract Studio for .Net. 

   - Improved UI for correcting OCR artifacts.
   - Bundles Tesseract 4.1 RC1.
   - Updated Pdfium engine.
   - TLS bug fix that produced "Activation Failed" warnings.


Download: https://github.com/OpaitSoftware/TesseractStudio.Net

Thank you,

Farhad Khalafi
Opait Software

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c2a1ea5e-a6c7-4a8b-9447-6806d694da15%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Standalone Self-contained Tesseract-OCR for Mac

2019-03-26 Thread Shakir Zareen

>
> Hi
>

For me none of above worked for Tesseract v 4.0. So I took an unorthodox 
approach as follows:

- Got a MAC Virtual machine up and running
- Install Homebrew
- Install Tesseract using Homebrew (4.0)
- Copied the whole Cellar folder (as it contains all dependencies for 
tesseract)

Then comes the fun part.

- All libs in the various folders refer to each other via a path convention 
as "/usr/loca/Cellar/leptonica/lib/lebt5.dylib"
- The output libtesseract.4.dylib refers to leptonica and leptonica refers 
to jpg, tiff etc libs
- So we have to update all libs so that the paths being referred from 
"usr/local/Cellar/leptonica..." should change to "../../../leptonica" for 
all libs
- We can use otool -L  to get that (otoll is part of XCode 
command line tools)
- Then we can use install_name_tool -change to change references to dylibs
- It was a hard process but I did that one by one and here are the command 
(provided you are in the pwd of CopyOfCellar/tesseract/bin )

install_name_tool -change /usr/local/opt/leptonica/lib/liblept.5.dylib 
../../../leptonica/1.78.0/lib/liblept.5.dylib  tesseract 

install_name_tool -change /usr/local/opt/leptonica/lib/liblept.5.dylib 
../../../leptonica/1.78.0/lib/liblept.5.dylib  
/LocalPathofCopyOfCellar/tesseract/4.0.0_1/lib/libtesseract.4.dylib

install_name_tool -change /usr/local/opt/libpng/lib/libpng16.16.dylib 
../../../libpng/1.6.36/lib/libpng16.16.dylib 
/LocalPathofCopyOfCellar/leptonica/1.78.0/lib/liblept.5.dylib

install_name_tool -change /usr/local/opt/jpeg/lib/libjpeg.9.dylib 
../../../jpeg/9c/lib/libjpeg.9.dylib 
/LocalPathofCopyOfCellar/leptonica/1.78.0/lib/liblept.5.dylib

install_name_tool -change /usr/local/opt/giflib/lib/libgif.7.dylib 
../../../giflib/5.1.4_1/lib/libgif.7.dylib 
/LocalPathofCopyOfCellar/leptonica/1.78.0/lib/liblept.5.dylib

install_name_tool -change /usr/local/opt/libtiff/lib/libtiff.5.dylib 
../../../libtiff/4.0.10_1/lib/libtiff.5.dylib 
/LocalPathofCopyOfCellar/leptonica/1.78.0/lib/liblept.5.dylib

install_name_tool -change /usr/local/opt/webp/lib/libwebp.7.dylib 
../../../webp/1.0.2/lib/libwebp.7.dylib 
/LocalPathofCopyOfCellar/leptonica/1.78.0/lib/liblept.5.dylib

install_name_tool -change /usr/local/opt/openjpeg/lib/libopenjp2.7.dylib 
../../../openjpeg/2.3.0/lib/libopenjp2.2.3.0.dylib 
/LocalPathofCopyOfCellar/leptonica/1.78.0/lib/liblept.5.dylib

install_name_tool -change /usr/local/opt/jpeg/lib/libjpeg.9.dylib 
../../../jpeg/9c/lib/libjpeg.9.dylib 
/LocalPathofCopyOfCellar/libtiff/4.0.10_1/lib/libtiff.5.dylib

Once all above is done the tesseract becomes standalone provide you keep 
all the libs and includes in the folder structure as in original Cellar.

export TESSDATA_PREFIX=../share/tessdata

Now if some body can make all that into a bash script that will be great.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/18fdea77-2337-4862-b60e-8adf164626a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Unable to recognise small size image

2019-03-26 Thread Heeramani Prasad
[image: Screenshot from 2019-03-26 15-23-55.png]
Thanks a lot. Its now working fine. I am struggling for it around weekend.
Here is result.





On Tue, Mar 26, 2019 at 9:07 AM Shree Devi Kumar 
wrote:

> try --psm 6 --dpi 300
>
> ubuntu@tesseract-ocr:~/TEST$ tesseract small.png - --psm 6 --dpi 300
> a) !
> b) |
> c) *
> d) _
> ubuntu@tesseract-ocr:~/TEST$ tesseract small.png - --psm 6
> a) !
> b) |
> c) *
> d) _
>
>
> On Mon, Mar 25, 2019 at 11:39 PM Heeramani Prasad 
> wrote:
>
>> I am trying to recognise various images For some examples it work.But it
>> fails to recognise small images. I am attaching some sample images  here
>> for references.
>>
>>
>>
>> [image: im_crop7_1.jpg]
>>
>> [image: im_crop8_1.jpg]
>>
>> [image: im_crop9_1.jpg]
>>
>> [image: im_crop10_1.jpg]
>>
>>
>> Any help would be appreciated!
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/05b69421-f7fc-476d-b096-56a970017858%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
>
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX4v5WANCfauz7wPSR6X14N%3DE2NzGNAnTAxzcfBHDpQnA%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CALHeFFW6uiXKHut-HwAkXk0fZtYJ4BFXTftKk3%2BW%2BicARSapCw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Please help, Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.

2019-03-26 Thread Shanshan Wang

Hi, looks like your ara.traineddata is fine, how about put a eng.traieddata 
in your path `/ocrd/usr/share/tessdata/`. 



On Wednesday, March 6, 2019 at 11:25:37 AM UTC-6, XBACK_10 wrote:
>
> Iam new to tesseract training, I use cygwin win 8.1 64
> tesseract 4.0.0
>  leptonica-1.77.0
>   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : 
> libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>  Found AVX
>  Found SSE
>
> I use ocrd 
> want to train arabic data, I modify all the variables to my setup and 
> placed ara.traineddata _best to this 
> directory /ocrd/usr/share/tessdata/ara.traineddata
> I attach the make file I use
>
> Now This problem occur, 
> $ make training test
> python generate_line_box.py -i "data/ground-truth/line_1_10.tif" -t 
> "data/ground-truth/line_1_10.gt.txt" > "data/ground-truth/line_1_10.box"
> python generate_line_box.py -i "data/ground-truth/line_1_100.tif" -t 
> "data/ground-truth/line_1_100.gt.txt" > "data/ground-truth/line_1_100.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1004.tif" -t 
> "data/ground-truth/line_1_1004.gt.txt" > "data/ground-truth/line_1_1004.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1005.tif" -t 
> "data/ground-truth/line_1_1005.gt.txt" > "data/ground-truth/line_1_1005.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1007.tif" -t 
> "data/ground-truth/line_1_1007.gt.txt" > "data/ground-truth/line_1_1007.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1009.tif" -t 
> "data/ground-truth/line_1_1009.gt.txt" > "data/ground-truth/line_1_1009.box"
> python generate_line_box.py -i "data/ground-truth/line_1_101.tif" -t 
> "data/ground-truth/line_1_101.gt.txt" > "data/ground-truth/line_1_101.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1010.tif" -t 
> "data/ground-truth/line_1_1010.gt.txt" > "data/ground-truth/line_1_1010.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1012.tif" -t 
> "data/ground-truth/line_1_1012.gt.txt" > "data/ground-truth/line_1_1012.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1013.tif" -t 
> "data/ground-truth/line_1_1013.gt.txt" > "data/ground-truth/line_1_1013.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1014.tif" -t 
> "data/ground-truth/line_1_1014.gt.txt" > "data/ground-truth/line_1_1014.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1015.tif" -t 
> "data/ground-truth/line_1_1015.gt.txt" > "data/ground-truth/line_1_1015.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1016.tif" -t 
> "data/ground-truth/line_1_1016.gt.txt" > "data/ground-truth/line_1_1016.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1017.tif" -t 
> "data/ground-truth/line_1_1017.gt.txt" > "data/ground-truth/line_1_1017.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1018.tif" -t 
> "data/ground-truth/line_1_1018.gt.txt" > "data/ground-truth/line_1_1018.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1019.tif" -t 
> "data/ground-truth/line_1_1019.gt.txt" > "data/ground-truth/line_1_1019.box"
> python generate_line_box.py -i "data/ground-truth/line_1_102.tif" -t 
> "data/ground-truth/line_1_102.gt.txt" > "data/ground-truth/line_1_102.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1020.tif" -t 
> "data/ground-truth/line_1_1020.gt.txt" > "data/ground-truth/line_1_1020.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1021.tif" -t 
> "data/ground-truth/line_1_1021.gt.txt" > "data/ground-truth/line_1_1021.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1022.tif" -t 
> "data/ground-truth/line_1_1022.gt.txt" > "data/ground-truth/line_1_1022.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1023.tif" -t 
> "data/ground-truth/line_1_1023.gt.txt" > "data/ground-truth/line_1_1023.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1024.tif" -t 
> "data/ground-truth/line_1_1024.gt.txt" > "data/ground-truth/line_1_1024.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1025.tif" -t 
> "data/ground-truth/line_1_1025.gt.txt" > "data/ground-truth/line_1_1025.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1026.tif" -t 
> "data/ground-truth/line_1_1026.gt.txt" > "data/ground-truth/line_1_1026.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1027.tif" -t 
> "data/ground-truth/line_1_1027.gt.txt" > "data/ground-truth/line_1_1027.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1028.tif" -t 
> "data/ground-truth/line_1_1028.gt.txt" > "data/ground-truth/line_1_1028.box"
> python generate_line_box.py -i "data/ground-truth/line_1_103.tif" -t 
> "data/ground-truth/line_1_103.gt.txt" > "data/ground-truth/line_1_103.box"
> python generate_line_box.py -i "data/ground-truth/line_1_1031.tif" -t 
> "data/ground-truth/line_1_1031.gt.txt" > "data/ground-truth/line_1_1031.box"
> python generate_line_box.py -i