[tesseract-ocr] Re: Tess4J: Invalid memory access

2020-02-15 Thread Quan Nguyen
setDatapath should be set to the path to tessdata folder, which contains 
*.traineddata files. It's not the path to your image files.

On Saturday, February 15, 2020 at 8:14:09 AM UTC-6, Rajith Kariyawsam wrote:
>
> Hi Quan,
>
> I got the point. By the below video.
> I miss download dependency.
> https://www.youtube.com/watch?v=5DqW9KP-aQo=425s
>
> And I will try that.
> Thank you very much.
>
> On Saturday, February 15, 2020 at 7:22:36 PM UTC+5:30, Rajith Kariyawsam 
> wrote:
>>
>> Hi Quan,
>>
>> 'pth' is the image location in my PC. 
>> I verified it with debug mood too.
>> As I know image location should set to the 'Datapath.'
>>
>> If the 'pth' is incorrect what should pass for that parameter. 
>>
>> Realy helpful if you can further explain it to me, please ?
>>
>> On Saturday, February 15, 2020 at 10:11:43 AM UTC+5:30, Quan Nguyen wrote:
>>>
>>> cptcha.setDatapath(pth); < incorrect pth value
>>>
>>>
>>> On Wednesday, February 12, 2020 at 10:00:31 PM UTC-6, Rajith Kariyawsam 
>>> wrote:

 Hi Quan,
 I didn't got wht do you mean by 'tessdata ' folder.
 given pth is the copied image(png) location.  my image name is* 
 'testcap.png'*

 as per the below line 

 String pth = "C:\\Users\\username\\Downloads\\capthca1\\testcap.png";

 FileHandler.copy(imgFile, new File(pth));



 Appreciate it if you can further describe it, please.



 On Thursday, February 13, 2020 at 12:16:27 AM UTC+5:30, Quan Nguyen 
 wrote:
>
> It looks like the datapath is set incorrectly. It should be set to 
> tessdata folder.
>
> On Tuesday, February 11, 2020 at 2:30:45 AM UTC-6, Rajith Kariyawsam 
> wrote:
>>
>> Still, the same error occurred for me.
>>
>> code: 
>>
>> 
>> net.sourceforge.tess4j
>> tess4j
>> 4.3.1
>> 
>>
>>
>> 
>> org.seleniumhq.selenium
>> selenium-java
>> 3.141.59
>> 
>>
>>
>> File imgFile = 
>> findElement(captchaimgIdPath).getScreenshotAs(OutputType.FILE);
>> String pth = "C:\\Users\\username\\Downloads\\capthca1\\testcap.png"; 
>> //src/main/resources
>> Thread.sleep(2000);
>> FileHandler.copy(imgFile, new File(pth));
>> Thread.sleep(2000);
>> Tesseract cptcha = new Tesseract();
>> cptcha.setDatapath(pth);
>> cptcha.setLanguage("eng");
>> String text = cptcha.doOCR(new File(pth));
>>
>> System.out.println(text);
>>
>>
>> On Sunday, September 2, 2018 at 10:20:53 PM UTC+5:30, Subramaniyan 
>> Suresh wrote:
>>>
>>> I am using Tess4J in my project to extract text from an image (Using 
>>> Eclipse IDE). I am getting the following error when I try run the OCR. 
>>> Any 
>>> suggestion?  
>>>
>>> *Error: Exception in thread "main" java.lang.Error: Invalid memory 
>>> access*
>>>
>>>
>>> *Note: I have attached the image file which I've used *
>>>
>>> *My Code*:
>>>
>>>
>>> package tesseractTraining;
>>>
>>>
>>> import java.io.File;
>>>
>>> import net.sourceforge.tess4j.*;
>>>
>>>
>>> public class TesseractMainRunner {
>>>
>>> public static void main(String[] args) {
>>>
>>> File imageFile = new File("E:\\Tesseract\\Test Images\\sample.png");
>>>
>>> Tesseract instance = new Tesseract();
>>>
>>> try {
>>>
>>> instance.setDatapath("C:\\Program Files 
>>> (x86)\\Tesseract-OCR\\tessdata");
>>>
>>> instance.setLanguage("eng");
>>>
>>> String result = instance.doOCR(imageFile);
>>>
>>> System.out.println(result);
>>>
>>> } catch (TesseractException e) {
>>>
>>> System.err.println(e.getMessage());
>>>
>>> }
>>>
>>> imageFile.exists();
>>>
>>> }
>>>
>>>
>>> }
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0ddb242e-a56a-4804-a9bc-25731ce6273d%40googlegroups.com.


Re: [tesseract-ocr] Need help in training Tesseract with application images

2020-02-15 Thread preeti padalia
Hi shree,
Because of some other priorities i could not check this. I tried your
suggestion and based on document for tesstrain u tried to make training of
sample test set in tesstrain.
I am trying to do the setup in windows usi g cygwin.
But getting following error

make training

tesseract data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif
data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 6 lstm.train

Tesseract Open Source OCR Engine v5.0.0-alpha with Leptonica

Page 1

Warning: Invalid resolution 0 dpi. Using 70 instead.

Failed to read boxes from
data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif

Error during processing.

make: *** [Makefile:187:
data/foo-ground-truth/alexis_ruhe01_1852_0018_022.lstmf] Error 1

 Please help .

On Thu, 19 Dec, 2019, 6:08 PM Shree Devi Kumar, 
wrote:

> Please use https://github.com/tesseract-ocr/tesstrain
>
> This works on line images and their ground-truth transcription.
>
> On Windows, you could install WSL for running the *NIX scripts.
>
> On Thu, Dec 19, 2019 at 11:14 AM preeti padalia 
> wrote:
>
>> Hi,
>>
>> We are using tesseract to perform actions and verifications for our
>> application. But we are facing issues for some of the characters. As i have
>> never tried training if any one can help with this.
>>
>> Any link for how to train with application image.
>> how to prepare training data.
>> How to do the training in windows if possible.
>> Thanks in advance.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/11c54009-25e1-439e-96ab-ee2ad049453b%40googlegroups.com
>> .
>>
>
>
> --
>
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW85cieDqD5COFM5GXve4eE3vDBdB0L4MXX-FH4aL41dQ%40mail.gmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CACs6KSC_H2_mWh_8VDEshTznvMjcxc4rbgkM4-n2%3DB3V5gWwLQ%40mail.gmail.com.


[tesseract-ocr] Re: Tess4J: Invalid memory access

2020-02-15 Thread Rajith Kariyawsam
Hi Quan,

I got the point. By the below video.
I miss download dependency.
https://www.youtube.com/watch?v=5DqW9KP-aQo=425s

And I will try that.
Thank you very much.

On Saturday, February 15, 2020 at 7:22:36 PM UTC+5:30, Rajith Kariyawsam 
wrote:
>
> Hi Quan,
>
> 'pth' is the image location in my PC. 
> I verified it with debug mood too.
> As I know image location should set to the 'Datapath.'
>
> If the 'pth' is incorrect what should pass for that parameter. 
>
> Realy helpful if you can further explain it to me, please ?
>
> On Saturday, February 15, 2020 at 10:11:43 AM UTC+5:30, Quan Nguyen wrote:
>>
>> cptcha.setDatapath(pth); < incorrect pth value
>>
>>
>> On Wednesday, February 12, 2020 at 10:00:31 PM UTC-6, Rajith Kariyawsam 
>> wrote:
>>>
>>> Hi Quan,
>>> I didn't got wht do you mean by 'tessdata ' folder.
>>> given pth is the copied image(png) location.  my image name is* 
>>> 'testcap.png'*
>>>
>>> as per the below line 
>>>
>>> String pth = "C:\\Users\\username\\Downloads\\capthca1\\testcap.png";
>>>
>>> FileHandler.copy(imgFile, new File(pth));
>>>
>>>
>>>
>>> Appreciate it if you can further describe it, please.
>>>
>>>
>>>
>>> On Thursday, February 13, 2020 at 12:16:27 AM UTC+5:30, Quan Nguyen 
>>> wrote:

 It looks like the datapath is set incorrectly. It should be set to 
 tessdata folder.

 On Tuesday, February 11, 2020 at 2:30:45 AM UTC-6, Rajith Kariyawsam 
 wrote:
>
> Still, the same error occurred for me.
>
> code: 
>
> 
> net.sourceforge.tess4j
> tess4j
> 4.3.1
> 
>
>
> 
> org.seleniumhq.selenium
> selenium-java
> 3.141.59
> 
>
>
> File imgFile = 
> findElement(captchaimgIdPath).getScreenshotAs(OutputType.FILE);
> String pth = "C:\\Users\\username\\Downloads\\capthca1\\testcap.png"; 
> //src/main/resources
> Thread.sleep(2000);
> FileHandler.copy(imgFile, new File(pth));
> Thread.sleep(2000);
> Tesseract cptcha = new Tesseract();
> cptcha.setDatapath(pth);
> cptcha.setLanguage("eng");
> String text = cptcha.doOCR(new File(pth));
>
> System.out.println(text);
>
>
> On Sunday, September 2, 2018 at 10:20:53 PM UTC+5:30, Subramaniyan 
> Suresh wrote:
>>
>> I am using Tess4J in my project to extract text from an image (Using 
>> Eclipse IDE). I am getting the following error when I try run the OCR. 
>> Any 
>> suggestion?  
>>
>> *Error: Exception in thread "main" java.lang.Error: Invalid memory 
>> access*
>>
>>
>> *Note: I have attached the image file which I've used *
>>
>> *My Code*:
>>
>>
>> package tesseractTraining;
>>
>>
>> import java.io.File;
>>
>> import net.sourceforge.tess4j.*;
>>
>>
>> public class TesseractMainRunner {
>>
>> public static void main(String[] args) {
>>
>> File imageFile = new File("E:\\Tesseract\\Test Images\\sample.png");
>>
>> Tesseract instance = new Tesseract();
>>
>> try {
>>
>> instance.setDatapath("C:\\Program Files 
>> (x86)\\Tesseract-OCR\\tessdata");
>>
>> instance.setLanguage("eng");
>>
>> String result = instance.doOCR(imageFile);
>>
>> System.out.println(result);
>>
>> } catch (TesseractException e) {
>>
>> System.err.println(e.getMessage());
>>
>> }
>>
>> imageFile.exists();
>>
>> }
>>
>>
>> }
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7cd80518-1c3e-48d4-9b23-606bd82af5ba%40googlegroups.com.


[tesseract-ocr] Re: Tess4J: Invalid memory access

2020-02-15 Thread Rajith Kariyawsam
Hi Quan,

'pth' is the image location in my PC. 
I verified it with debug mood too.
As I know image location should set to the 'Datapath.'

If the 'pth' is incorrect what should pass for that parameter. 

Realy helpful if you can further explain it to me, please ?

On Saturday, February 15, 2020 at 10:11:43 AM UTC+5:30, Quan Nguyen wrote:
>
> cptcha.setDatapath(pth); < incorrect pth value
>
>
> On Wednesday, February 12, 2020 at 10:00:31 PM UTC-6, Rajith Kariyawsam 
> wrote:
>>
>> Hi Quan,
>> I didn't got wht do you mean by 'tessdata ' folder.
>> given pth is the copied image(png) location.  my image name is* 
>> 'testcap.png'*
>>
>> as per the below line 
>>
>> String pth = "C:\\Users\\username\\Downloads\\capthca1\\testcap.png";
>>
>> FileHandler.copy(imgFile, new File(pth));
>>
>>
>>
>> Appreciate it if you can further describe it, please.
>>
>>
>>
>> On Thursday, February 13, 2020 at 12:16:27 AM UTC+5:30, Quan Nguyen wrote:
>>>
>>> It looks like the datapath is set incorrectly. It should be set to 
>>> tessdata folder.
>>>
>>> On Tuesday, February 11, 2020 at 2:30:45 AM UTC-6, Rajith Kariyawsam 
>>> wrote:

 Still, the same error occurred for me.

 code: 

 
 net.sourceforge.tess4j
 tess4j
 4.3.1
 


 
 org.seleniumhq.selenium
 selenium-java
 3.141.59
 


 File imgFile = 
 findElement(captchaimgIdPath).getScreenshotAs(OutputType.FILE);
 String pth = "C:\\Users\\username\\Downloads\\capthca1\\testcap.png"; 
 //src/main/resources
 Thread.sleep(2000);
 FileHandler.copy(imgFile, new File(pth));
 Thread.sleep(2000);
 Tesseract cptcha = new Tesseract();
 cptcha.setDatapath(pth);
 cptcha.setLanguage("eng");
 String text = cptcha.doOCR(new File(pth));

 System.out.println(text);


 On Sunday, September 2, 2018 at 10:20:53 PM UTC+5:30, Subramaniyan 
 Suresh wrote:
>
> I am using Tess4J in my project to extract text from an image (Using 
> Eclipse IDE). I am getting the following error when I try run the OCR. 
> Any 
> suggestion?  
>
> *Error: Exception in thread "main" java.lang.Error: Invalid memory 
> access*
>
>
> *Note: I have attached the image file which I've used *
>
> *My Code*:
>
>
> package tesseractTraining;
>
>
> import java.io.File;
>
> import net.sourceforge.tess4j.*;
>
>
> public class TesseractMainRunner {
>
> public static void main(String[] args) {
>
> File imageFile = new File("E:\\Tesseract\\Test Images\\sample.png");
>
> Tesseract instance = new Tesseract();
>
> try {
>
> instance.setDatapath("C:\\Program Files 
> (x86)\\Tesseract-OCR\\tessdata");
>
> instance.setLanguage("eng");
>
> String result = instance.doOCR(imageFile);
>
> System.out.println(result);
>
> } catch (TesseractException e) {
>
> System.err.println(e.getMessage());
>
> }
>
> imageFile.exists();
>
> }
>
>
> }
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/48acaed4-0485-412f-b81f-fa13f883bddf%40googlegroups.com.


Re: [tesseract-ocr] Re: checkbox recognition-Tesseract 4

2020-02-15 Thread kamran hamid
i have some problem of tesseract for the Urdu language.tesseract did not
recognize the text from the picture.

On Sat, Feb 15, 2020 at 5:04 PM Josh Wieder  wrote:

> correction: my bad guys, previous poster is correct. the changelog on the
> site is a mishmash of changes for 3-4 different applications.
>
> latest available jtessbox version supports no later than tesseract
> 3.05-dev. for what its worth, I havent made up my own mind on the best
> option for zone selection. my own preference is something that wont lock me
> into v3, though.
>
>
> On Fri, Feb 14, 2020, 11:45 PM Quan Nguyen  wrote:
>
>> jTessBoxEditor is for training for Tesseract 3.0x format only. For 4.0x,
>> please consult
>> https://github.com/tesseract-ocr/tessdoc/blob/master/TrainingTesseract-4.00.md
>>
>>
>> On Thursday, February 13, 2020 at 8:37:59 AM UTC-6, PD wrote:
>>>
>>> 0
>>> 
>>>
>>> Hello
>>>
>>> Is there anyway where Tesseract 4 can be trained for checkbox ? I want
>>> to train Tesseract for empty checkbox , checkbox with cross/check sign.
>>> Default English trained data does not identify checkbox.I tried defining
>>> new font using jTessBoxEditor and trained it using this tool. but no
>>> success.
>>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/tesseract-ocr/bpxTF3vfB-I/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> tesseract-ocr+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/cf6226d5-3c88-4282-acec-b49363988f4c%40googlegroups.com
>> 
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAMdLX2ae3wGY-cb1zjYcN7-2v3QqiysttaDyrT7xQ6bq5joxtg%40mail.gmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CALOyLZewQOOEpQ8V5Z4HH63%3DUVn5NNTWQ-WmRX04P9Zx0dRmqQ%40mail.gmail.com.


Re: [tesseract-ocr] Re: checkbox recognition-Tesseract 4

2020-02-15 Thread Josh Wieder
correction: my bad guys, previous poster is correct. the changelog on the
site is a mishmash of changes for 3-4 different applications.

latest available jtessbox version supports no later than tesseract
3.05-dev. for what its worth, I havent made up my own mind on the best
option for zone selection. my own preference is something that wont lock me
into v3, though.


On Fri, Feb 14, 2020, 11:45 PM Quan Nguyen  wrote:

> jTessBoxEditor is for training for Tesseract 3.0x format only. For 4.0x,
> please consult
> https://github.com/tesseract-ocr/tessdoc/blob/master/TrainingTesseract-4.00.md
>
>
> On Thursday, February 13, 2020 at 8:37:59 AM UTC-6, PD wrote:
>>
>> 0
>> 
>>
>> Hello
>>
>> Is there anyway where Tesseract 4 can be trained for checkbox ? I want to
>> train Tesseract for empty checkbox , checkbox with cross/check sign.
>> Default English trained data does not identify checkbox.I tried defining
>> new font using jTessBoxEditor and trained it using this tool. but no
>> success.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/bpxTF3vfB-I/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/cf6226d5-3c88-4282-acec-b49363988f4c%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMdLX2ae3wGY-cb1zjYcN7-2v3QqiysttaDyrT7xQ6bq5joxtg%40mail.gmail.com.


Re: [tesseract-ocr] Re: checkbox recognition-Tesseract 4

2020-02-15 Thread Josh Wieder
Im not sure that v4 incompatibility claim is accurate. The landing page of
the website for jtessboxeditor only lists compatibility with v2 & v3. The
changelog for the application itself specifies that the latest update
offers support for tesseract 4.1.1 (which is why I requested clarification
on version numbering ... using an earlier version with tesseract 4 would
not work)


On Fri, Feb 14, 2020, 11:45 PM Quan Nguyen  wrote:

> jTessBoxEditor is for training for Tesseract 3.0x format only. For 4.0x,
> please consult
> https://github.com/tesseract-ocr/tessdoc/blob/master/TrainingTesseract-4.00.md
>
>
> On Thursday, February 13, 2020 at 8:37:59 AM UTC-6, PD wrote:
>>
>> 0
>> 
>>
>> Hello
>>
>> Is there anyway where Tesseract 4 can be trained for checkbox ? I want to
>> train Tesseract for empty checkbox , checkbox with cross/check sign.
>> Default English trained data does not identify checkbox.I tried defining
>> new font using jTessBoxEditor and trained it using this tool. but no
>> success.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/bpxTF3vfB-I/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/cf6226d5-3c88-4282-acec-b49363988f4c%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMdLX2aBSQMHfw3QKyHx58L%2BggvuC0PpVNRH_J%2BTYPDguzt04Q%40mail.gmail.com.


[tesseract-ocr] cordova tesseract_ocr-code -reading Cmc 7 font

2020-02-15 Thread haytham Arori
Hi all
Hope you are fine

I worke in a project in which i want  to read  micr line at bank check
(cmc_7 font ) ...


There anyone can help me by provide me with traning data file  and  code
using cordova tesseract_ocr_plugins using  in html and java script
language__ worked on hybred apps. ( IOs and android mobile )

BR

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAGH9kxGnzqCg6ZAiXYrCOpdj0gimApxg80z1h-ZPnm1Ci8ic3w%40mail.gmail.com.