[tesseract-ocr] Re: Finetune tesseract fonts

2020-02-14 Thread Quan Nguyen
https://github.com/tesseract-ocr/tessdoc/blob/master/TrainingTesseract-4.00.md

On Friday, February 14, 2020 at 2:43:28 AM UTC-6, susil mishra wrote:
>
> I am new to tesseract and using 4.0 version and try to fine tune my 
> existing font. Could some one help me to provide the steps to train 
> existing font.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/003c6517-af69-424d-8009-6cdb6e0057f7%40googlegroups.com.


[tesseract-ocr] Re: checkbox recognition-Tesseract 4

2020-02-14 Thread Quan Nguyen
jTessBoxEditor is for training for Tesseract 3.0x format only. For 4.0x, 
please consult 
https://github.com/tesseract-ocr/tessdoc/blob/master/TrainingTesseract-4.00.md
 

On Thursday, February 13, 2020 at 8:37:59 AM UTC-6, PD wrote:
>
> 0
> 
>
> Hello
>
> Is there anyway where Tesseract 4 can be trained for checkbox ? I want to 
> train Tesseract for empty checkbox , checkbox with cross/check sign. 
> Default English trained data does not identify checkbox.I tried defining 
> new font using jTessBoxEditor and trained it using this tool. but no 
> success.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/cf6226d5-3c88-4282-acec-b49363988f4c%40googlegroups.com.


[tesseract-ocr] Re: Tess4J: Invalid memory access

2020-02-14 Thread Quan Nguyen


cptcha.setDatapath(pth); < incorrect pth value


On Wednesday, February 12, 2020 at 10:00:31 PM UTC-6, Rajith Kariyawsam 
wrote:
>
> Hi Quan,
> I didn't got wht do you mean by 'tessdata ' folder.
> given pth is the copied image(png) location.  my image name is* 
> 'testcap.png'*
>
> as per the below line 
>
> String pth = "C:\\Users\\username\\Downloads\\capthca1\\testcap.png";
>
> FileHandler.copy(imgFile, new File(pth));
>
>
>
> Appreciate it if you can further describe it, please.
>
>
>
> On Thursday, February 13, 2020 at 12:16:27 AM UTC+5:30, Quan Nguyen wrote:
>>
>> It looks like the datapath is set incorrectly. It should be set to 
>> tessdata folder.
>>
>> On Tuesday, February 11, 2020 at 2:30:45 AM UTC-6, Rajith Kariyawsam 
>> wrote:
>>>
>>> Still, the same error occurred for me.
>>>
>>> code: 
>>>
>>> 
>>> net.sourceforge.tess4j
>>> tess4j
>>> 4.3.1
>>> 
>>>
>>>
>>> 
>>> org.seleniumhq.selenium
>>> selenium-java
>>> 3.141.59
>>> 
>>>
>>>
>>> File imgFile = 
>>> findElement(captchaimgIdPath).getScreenshotAs(OutputType.FILE);
>>> String pth = "C:\\Users\\username\\Downloads\\capthca1\\testcap.png"; 
>>> //src/main/resources
>>> Thread.sleep(2000);
>>> FileHandler.copy(imgFile, new File(pth));
>>> Thread.sleep(2000);
>>> Tesseract cptcha = new Tesseract();
>>> cptcha.setDatapath(pth);
>>> cptcha.setLanguage("eng");
>>> String text = cptcha.doOCR(new File(pth));
>>>
>>> System.out.println(text);
>>>
>>>
>>> On Sunday, September 2, 2018 at 10:20:53 PM UTC+5:30, Subramaniyan 
>>> Suresh wrote:

 I am using Tess4J in my project to extract text from an image (Using 
 Eclipse IDE). I am getting the following error when I try run the OCR. Any 
 suggestion?  

 *Error: Exception in thread "main" java.lang.Error: Invalid memory 
 access*


 *Note: I have attached the image file which I've used *

 *My Code*:


 package tesseractTraining;


 import java.io.File;

 import net.sourceforge.tess4j.*;


 public class TesseractMainRunner {

 public static void main(String[] args) {

 File imageFile = new File("E:\\Tesseract\\Test Images\\sample.png");

 Tesseract instance = new Tesseract();

 try {

 instance.setDatapath("C:\\Program Files 
 (x86)\\Tesseract-OCR\\tessdata");

 instance.setLanguage("eng");

 String result = instance.doOCR(imageFile);

 System.out.println(result);

 } catch (TesseractException e) {

 System.err.println(e.getMessage());

 }

 imageFile.exists();

 }


 }



-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5fe365a1-1e3b-470c-9911-915773cff152%40googlegroups.com.


[tesseract-ocr] Re: checkbox recognition-Tesseract 4

2020-02-14 Thread Josh Wieder
You will have a better chance of a successful response if you can provide 
some additional information about your situation. At a minimum, please 
provide:

- your exact version of jTessBoxEditor, tesseract (ie 4.0.1 rather than 4) 
& all of the pre-requisites listed on the jtessboxeditor website 
(http://vietocr.sourceforge.net/usage.html) eg javascript
- some minimal information about your environment ... linux/windows? 
python/.NET?
- the exact error message that you receive in jTessBoxEditor and exact 
steps to reproduce it

Assuming this is occurring immediately post-install for you, providing 
step-by-step of how you installed jTessBoxEditor would likely also help.

Cheers, 
Josh

On Thursday, February 13, 2020 at 9:37:59 AM UTC-5, PD wrote:
>
> 0
> 
>
> Hello
>
> Is there anyway where Tesseract 4 can be trained for checkbox ? I want to 
> train Tesseract for empty checkbox , checkbox with cross/check sign. 
> Default English trained data does not identify checkbox.I tried defining 
> new font using jTessBoxEditor and trained it using this tool. but no 
> success.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/98ef18ac-f4e4-42f4-92f6-a2c2c040290f%40googlegroups.com.


[tesseract-ocr] Finetune tesseract fonts

2020-02-14 Thread susil mishra
I am new to tesseract and using 4.0 version and try to fine tune my existing 
font. Could some one help me to provide the steps to train existing font.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/33e687cc-61fe-4b5a-8d9b-091746fea441%40googlegroups.com.