Re: [tesseract-ocr] How to assess the quality of Tesseract OCR output programmatically?

2018-06-13 Thread ShreeDevi Kumar
You can compare OCRed text with groundtruth text. If creating pdf, you will have to extract text from it to compare. There are two options: https://github.com/impactcentre/ocrevalUAtion or https://github.com/eddieantonio/isri-ocr-evaluation-tools

Re: [tesseract-ocr] Leptoncia vs libleptonica-dev

2018-06-13 Thread Ning Zhao
Thanks marco for you explanation. As I can find liblept.{a,la,so} in lib folder and leptonica in include folder, I assume I have installed leptonica successfully. So the question would be how to let tesseract/configure know where the libs and headers are when I try to compile tesseract. Is

Re: [tesseract-ocr] Leptoncia vs libleptonica-dev

2018-06-13 Thread Marco Atzeri
On 6/13/2018 8:16 AM, Ning Zhao wrote: Hi all, The question in my mind now is whether leptonica and libleptonica-dev are the same thing as leptonica doesn't provide an executable. How can I check I have installed them/it successfully? leptonica is a library. As any library is usually

[tesseract-ocr] Leptoncia vs libleptonica-dev

2018-06-13 Thread Ning Zhao
Hi all, The question in my mind now is whether leptonica and libleptonica-dev are the same thing as leptonica doesn't provide an executable. How can I check I have installed them/it successfully? Here is how this question came into my mind: I'm following these links to install tesseract on

[tesseract-ocr] How to assess the quality of Tesseract OCR output programmatically?

2018-06-13 Thread nitin
Hi Dear members, Is there a way to 'assess the quality of Tesseract OCR output'? I need to provide such statistics along with the scanned image-to-pdf output file results, so the users can decide and sort whether the out-put quality is acceptable or not (like above 50%80% recognition done

[tesseract-ocr] Re: Leptoncia vs libleptonica-dev

2018-06-13 Thread Ning Zhao
I finally got ./configure passed. The solution is in the "Common Errors" section of the compiling guide , quoted here: If configure fails with such error "configure: error: Leptonica 1.74 or > higher is required." Try to install

[tesseract-ocr] Can :traineddata" for Tesseract 3 be used for Tesseract 4

2018-06-13 Thread chandra churh chatterjee
I have trained tesseract 3 with 64 fonts using respective box and .tr files, But now i want to use the same trained data for training tesseract 4 after creating the starter trained data using the "Using tesstrain The setup for running tesstrain.sh is the same as for base Tesseract. Use

Re: [tesseract-ocr] Can :traineddata" for Tesseract 3 be used for Tesseract 4

2018-06-13 Thread ShreeDevi Kumar
If you have box tiff pairs in tesseract4 format you can generate the lstmf files by running tesseract lang.file.exp0.tif lang.file.exp0 lstm.train lstm.train is a config file. ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] Can :traineddata" for Tesseract 3 be used for Tesseract 4

2018-06-13 Thread chandra churh chatterjee
can you tell me from which directory we have to run the following command and what will be the following arguments if we are using our trained data which contains files as follows: -07-2016 12:45 11 digits.f4.exp0.txt -a 08-07-2016 12:37198