Re: [tesseract-ocr] Error in creating LSTM training data using tesstrain.sh

2018-09-02 Thread Shandigutt
Thank you Shree. Now it works fine

On Sunday, September 2, 2018 at 6:41:28 AM UTC+3, shree wrote:
>
> > read_params_file: Can't open lstm.train 
>
> lstm.train is a config file which is not found.
>
> It is there in tesseract/tessdata/configs
>
> Make sure it is there in your tessdata directory or your path and can be 
> found.
>
> On Sun, Sep 2, 2018 at 3:40 AM, Shandigutt  > wrote:
>
>> Hi,
>>
>> I was trying to create LSTM training data using tesstrain.sh. I got the 
>> below error. Can somebody explain me what has gone wrong,
>>
>> *Command I used:*
>> ./src/training/tesstrain.sh --fonts_dir ../Support/font --lang sin 
>> --linedata_only \
>>   --noextract_font_properties --langdata_dir ../langdata \
>>   --tessdata_dir ../tessdata --output_dir ../training/sintrain --fontlist 
>> "BhashitaComplex" --training_text ../langdata/sin/sin.training_text 
>>
>> *Extract of the output:*
>> === Phase E: Generating lstmf files ===
>> Using TESSDATA_PREFIX=../tessdata
>> [2018 සැප්තැම්බර් 1 වැනි සෙනසුරාදා 21:41:25 +0300] 
>> /usr/local/bin/tesseract 
>> /tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0.tif 
>> /tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0 --psm 6 lstm.train 
>> ../langdata/sin/sin.config
>> read_params_file: Can't open lstm.train
>> Tesseract Open Source OCR Engine v4.0.0-beta.4-74-gd8237 with Leptonica
>> Page 1
>> Page 2
>> Page 3
>> ERROR: /tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0.lstmf does not 
>> exist or is not readable
>>
>> *For the complete output please see the attached err.txt*
>>
>> *After executing the command I checked the tmp directory it created. It 
>> was shown as below,*
>>
>> tharaka@tharaka-laptop-ubuntu:~$ cd /tmp/sin-2018-09-01.E4T/
>> tharaka@tharaka-laptop-ubuntu:/tmp/sin-2018-09-01.E4T$ ll
>> total 776
>> drwx--  2 tharaka tharaka   4096 සැප්   1 21:41 ./
>> drwxrwxrwt 50 rootroot  4096 සැප්   2 00:10 ../
>> -rw-r--r--  1 tharaka tharaka 249413 සැප්   1 21:41 
>> sin.BhashitaComplex.exp0.box
>> -rw-r--r--  1 tharaka tharaka 436290 සැප්   1 21:41 
>> sin.BhashitaComplex.exp0.tif
>> -rw-r--r--  1 tharaka tharaka   9099 සැප්   1 23:27 
>> sin.BhashitaComplex.exp0.txt
>> -rw-r--r--  1 tharaka tharaka   6543 සැප්   1 21:41 sin.unicharset
>> -rw-r--r--  1 tharaka tharaka   3053 සැප්   1 21:41 sin.xheights
>> -rw-r--r--  1 tharaka tharaka  71704 සැප්   1 23:27 tesstrain.log
>> tharaka@tharaka-laptop-ubuntu:/tmp/sin-2018-09-01.E4T$
>>
>> *My tesseract  version:*
>> tesseract 4.0.0-beta.4-74-gd8237
>>  leptonica-1.77.0
>>   libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 
>> 1.2.11
>>  Found SSE
>>
>> *My OS details,*
>> tharaka@tharaka-laptop-ubuntu:/tmp/sin-2018-09-01.E4T$ lsb_release -a
>> No LSB modules are available.
>> Distributor ID: Ubuntu
>> Description: Ubuntu 18.04.1 LTS
>> Release: 18.04
>> Codename: bionic
>>
>> Appreciate your support on this.
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/7d771008-c142-4302-8b5e-e1fd130cc140%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
>
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/dae1d474-c6b1-4b26-b796-7ca6c155d9d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tess4J: Invalid memory access

2018-09-02 Thread Quan Nguyen
Subramaniyan,

If possible, please put in a new issue 
at https://github.com/nguyenq/tess4j/issues for tracking purpose.

Thanks.

On Sunday, September 2, 2018 at 1:52:14 PM UTC-5, Quan Nguyen wrote:
>
> I tested your sample image and confirmed the error. It looks like a bug in 
> the routine that determines the image's bit depth. A new version will be 
> released once a fix is worked out and committed.
>
> Thank you for reporting.
>
> On Sunday, September 2, 2018 at 11:50:53 AM UTC-5, Subramaniyan Suresh 
> wrote:
>>
>> I am using Tess4J in my project to extract text from an image (Using 
>> Eclipse IDE). I am getting the following error when I try run the OCR. Any 
>> suggestion?  
>>
>> *Error: Exception in thread "main" java.lang.Error: Invalid memory access*
>>
>>
>> *Note: I have attached the image file which I've used *
>>
>> *My Code*:
>>
>>
>> package tesseractTraining;
>>
>>
>> import java.io.File;
>>
>> import net.sourceforge.tess4j.*;
>>
>>
>> public class TesseractMainRunner {
>>
>> public static void main(String[] args) {
>>
>> File imageFile = new File("E:\\Tesseract\\Test Images\\sample.png");
>>
>> Tesseract instance = new Tesseract();
>>
>> try {
>>
>> instance.setDatapath("C:\\Program Files (x86)\\Tesseract-OCR\\tessdata");
>>
>> instance.setLanguage("eng");
>>
>> String result = instance.doOCR(imageFile);
>>
>> System.out.println(result);
>>
>> } catch (TesseractException e) {
>>
>> System.err.println(e.getMessage());
>>
>> }
>>
>> imageFile.exists();
>>
>> }
>>
>>
>> }
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1ba5ba4c-91d9-4138-ac80-0633f43c3eab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tess4J: Invalid memory access

2018-09-02 Thread Quan Nguyen
I tested your sample image and confirmed the error. It looks like a bug in 
the routine that determines the image's bit depth. A new version will be 
released once a fix is worked out and committed.

Thank you for reporting.

On Sunday, September 2, 2018 at 11:50:53 AM UTC-5, Subramaniyan Suresh 
wrote:
>
> I am using Tess4J in my project to extract text from an image (Using 
> Eclipse IDE). I am getting the following error when I try run the OCR. Any 
> suggestion?  
>
> *Error: Exception in thread "main" java.lang.Error: Invalid memory access*
>
>
> *Note: I have attached the image file which I've used *
>
> *My Code*:
>
>
> package tesseractTraining;
>
>
> import java.io.File;
>
> import net.sourceforge.tess4j.*;
>
>
> public class TesseractMainRunner {
>
> public static void main(String[] args) {
>
> File imageFile = new File("E:\\Tesseract\\Test Images\\sample.png");
>
> Tesseract instance = new Tesseract();
>
> try {
>
> instance.setDatapath("C:\\Program Files (x86)\\Tesseract-OCR\\tessdata");
>
> instance.setLanguage("eng");
>
> String result = instance.doOCR(imageFile);
>
> System.out.println(result);
>
> } catch (TesseractException e) {
>
> System.err.println(e.getMessage());
>
> }
>
> imageFile.exists();
>
> }
>
>
> }
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/539303c3-4f18-4b97-a7d3-c11cf6e8e6d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Tess4J: Invalid memory access

2018-09-02 Thread Subramaniyan Suresh


I am using Tess4J in my project to extract text from an image (Using 
Eclipse IDE). I am getting the following error when I try run the OCR. Any 
suggestion?  

*Error: Exception in thread "main" java.lang.Error: Invalid memory access*


*Note: I have attached the image file which I've used *

*My Code*:


package tesseractTraining;


import java.io.File;

import net.sourceforge.tess4j.*;


public class TesseractMainRunner {

public static void main(String[] args) {

File imageFile = new File("E:\\Tesseract\\Test Images\\sample.png");

Tesseract instance = new Tesseract();

try {

instance.setDatapath("C:\\Program Files (x86)\\Tesseract-OCR\\tessdata");

instance.setLanguage("eng");

String result = instance.doOCR(imageFile);

System.out.println(result);

} catch (TesseractException e) {

System.err.println(e.getMessage());

}

imageFile.exists();

}


}

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/21d8edc4-e441-4288-861b-81155e4c2426%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] What i need to do fine tuning for only numbers and specific font?

2018-09-02 Thread Yasin Nazlıcan
Hello Soumik,

Thank you for replying back, i find out that we can train tesseract in macOS 
. But i 
couldn't make it work,  when I say "make training" it gives me "Need to 
reconfigure project, so there are no errors" error. Also, I couldn't create 
ScrollView.jar. Do you know how can i find errors?

>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4f4f3b2c-10ef-427f-9b34-2627a90913de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.