[tesseract-ocr] Unable to load library 'tesseract': Native library (linux-x86-64/libtesseract.so) not found in resource path on AWS Lambda

2019-02-06 Thread David Hom
I have compiled tesseract on Amazon Linux.  I have confirmed that I can 
successfully run the tesseract executable by starting a linux process from 
my java lamba function.  However, whenever I try to use jna to access the 
library libtesseract.so.4 is not found.

I have set the LD_LIBRARY_PATH environment variable in my function 
configuration to ${LD_LIBRARY_PATH}:/opt/lib
I have set jna.library.path to /opt/lib in the constructor of my lambda 
handler.

See the log below.  You can see the listing of libraries in /opt/lib.  
/opt/lib/libtesseract.so.4 exists.Further down the log, you can see jna 
trying various paths to resolve the library, including trying 
/opt/lib/libtesseract.so.4.  But for some reason, the library is not loaded.

Does anyone have any suggestions for resolving this issue?

Thanks,
David


START RequestId: 8c4ee17b-5f2e-4d1e-a7e1-dd2c428d7bb3 Version: $LATEST
List /opt/lib
libjpeg.so.62
liblept.so.5
libpng12.so.0
libstdc++.so.6
libtesseract.so.4
libtiff.so.5
libwebp.so.4
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
Loaded TessAPI
Looking in classpath from java.net.URLClassLoader@5d099f62 for 
/com/sun/jna/linux-x86-64/libjnidispatch.so
Found library resource at 
file:/var/task/com/sun/jna/linux-x86-64/libjnidispatch.so
Looking in /var/task/com/sun/jna/linux-x86-64/libjnidispatch.so
Looking for library 'tesseract'
Adding paths from jna.library.path: /opt/lib:/tmp/tess4j/win32-x86-64
Trying libtesseract.so
Adding system paths: [/usr/lib64, /lib64, /usr/lib, /lib]
Trying libtesseract.so
Looking for version variants
*Trying /opt/lib/libtesseract.so.4*
Looking in classpath from java.net.URLClassLoader@5d099f62 for tesseract
Unable to load library 'tesseract': Native library 
(linux-x86-64/libtesseract.so) not found in resource path 
([file:/var/task/]): java.lang.UnsatisfiedLinkError
java.lang.UnsatisfiedLinkError: Unable to load library 'tesseract': Native 
library (linux-x86-64/libtesseract.so) not found in resource path 
([file:/var/task/])
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:303)
at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:427)
at com.sun.jna.Library$Handler.(Library.java:179)
at com.sun.jna.Native.loadLibrary(Native.java:641)
at com.sun.jna.Native.loadLibrary(Native.java:625)
at net.sourceforge.tess4j.util.LoadLibs.getTessAPIInstance(LoadLibs.java:85)
at net.sourceforge.tess4j.TessAPI.(TessAPI.java:42)
at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:426)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:310)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:293)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:274)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:258)
at app.quin.sbox.Tess4jLambda.handleRequest(Tess4jLambda.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)

END RequestId: 8c4ee17b-5f2e-4d1e-a7e1-dd2c428d7bb3
REPORT RequestId: 8c4ee17b-5f2e-4d1e-a7e1-dd2c428d7bb3 Duration: 884.53 ms 
Billed 
Duration: 900 ms Memory Size: 3008 MB Max Memory Used: 114 MB 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4f68cd35-42b2-4b6a-861f-a0add0e7568c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Ocr-d train - Tesseract 4.0 Training

2019-02-06 Thread Kristóf Horváth
I might be wrong but i think OCR-D does it without a font.

2019. február 7., csütörtök 5:25:21 UTC+1 időpontban Timothy Snyder a 
következőt írta:
>
> I'm pretty sure you have to have a don't for lstm training. When I trained 
> tesseract 4 for hand writing, I used a font that was based on handwriting 
> to fulfill tesseract's requirement for at least one font.
>
> On Wed, Feb 6, 2019, 11:10 PM  wrote:
>
>> Thanks for your response, Since these are handwritten digits I don't have 
>> font data and what I'm having is cropped image blocks and I prepared some 
>> .gt.txt files. Is it possible to do lstm training without font data?
>>
>> On Tuesday, February 5, 2019 at 1:17:27 AM UTC+5:30, Lorenzo Blz wrote:
>>>
>>>
>>> To use ocrd you need to prepare image files and txt files with the same 
>>> name but different extension.
>>> For example:
>>>
>>> sample1.png
>>> sample1.gt.txt
>>>
>>> The gt.txt is a simple text file containing the correct text, 145, for 
>>> example.
>>>
>>> The images must be cropped with no border or just a couple of pixels. 
>>> Text height should be about 30/40px. Try different options to see what 
>>> works best.
>>>
>>> To recognize numbers ONLY you also need to replaced the line:
>>>
>>>merge_unicharsets $(TESSDATA)/$(CONTINUE_FROM).lstm-unicharset 
>>> $(TRAIN)/my.unicharset  "$@"
>>>
>>> with:
>>>
>>>cp "$(TRAIN)/my.unicharset" "data/unicharset"
>>>
>>> in the makefile (see 
>>> https://groups.google.com/forum/#!searchin/tesseract-ocr/l.bolzani%7Csort:date/tesseract-ocr/be4-rjvY2tQ/32evtMHlAQAJ
>>>  
>>> )
>>>
>>> Then follow the instructions on the ocrd site.
>>>
>>> You can try 100, 250, 500, 1000 and 2000 iterations and see what works 
>>> best (it depends on how much data you have).
>>>
>>>
>>> If you need to recognize nothing but handwritten numbers, you can also 
>>> look for github projects (not related to tesseract) about "MNIST" 
>>> handwritten numbers recognition with pre-trained models.
>>>
>>>
>>> Bye
>>>
>>> Lorenzo
>>>
>>>
>>> Il giorno lun 4 feb 2019 alle ore 08:34  ha scritto:
>>>
 I am a beginner for OCR training. Can anyone explain how to use Ocr-d 
 train briefly?

 I have Tesseract and Leptonica library installed in Cygwin

 tesseract 4.0.0
  leptonica-1.77.0
   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : 
 libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
  Found AVX2
  Found AVX
  Found SSE

 I want to train handwritten digits, because it is not detecting 
 correctly by default traineddata. I have searched group and found no 
 detailed instructions. I used Opencv and  python tesseract combination to 
 achieve OCR of printed text and came to linux for handwritten digits 
 training purpose. Kindly provide step by step instructions, it may help 
 others also. I have attached the sample images which requires training. 
 Thanks in advance

 -- 
 You received this message because you are subscribed to the Google 
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/tesseract-ocr/48ce49cc-6ade-4ebd-a1a6-5e382b033a95%40googlegroups.com
  
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/4b4745ff-7bba-4982-8ced-6df1d03a4590%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f92fa97b-b308-4bf0-a4c2-658e31af81de%40googlegroups.com.
For 

Re: [tesseract-ocr] Ocr-d train - Tesseract 4.0 Training

2019-02-06 Thread Timothy Snyder
I'm pretty sure you have to have a don't for lstm training. When I trained
tesseract 4 for hand writing, I used a font that was based on handwriting
to fulfill tesseract's requirement for at least one font.

On Wed, Feb 6, 2019, 11:10 PM  Thanks for your response, Since these are handwritten digits I don't have
> font data and what I'm having is cropped image blocks and I prepared some
> .gt.txt files. Is it possible to do lstm training without font data?
>
> On Tuesday, February 5, 2019 at 1:17:27 AM UTC+5:30, Lorenzo Blz wrote:
>>
>>
>> To use ocrd you need to prepare image files and txt files with the same
>> name but different extension.
>> For example:
>>
>> sample1.png
>> sample1.gt.txt
>>
>> The gt.txt is a simple text file containing the correct text, 145, for
>> example.
>>
>> The images must be cropped with no border or just a couple of pixels.
>> Text height should be about 30/40px. Try different options to see what
>> works best.
>>
>> To recognize numbers ONLY you also need to replaced the line:
>>
>>merge_unicharsets $(TESSDATA)/$(CONTINUE_FROM).lstm-unicharset
>> $(TRAIN)/my.unicharset  "$@"
>>
>> with:
>>
>>cp "$(TRAIN)/my.unicharset" "data/unicharset"
>>
>> in the makefile (see
>> https://groups.google.com/forum/#!searchin/tesseract-ocr/l.bolzani%7Csort:date/tesseract-ocr/be4-rjvY2tQ/32evtMHlAQAJ
>> )
>>
>> Then follow the instructions on the ocrd site.
>>
>> You can try 100, 250, 500, 1000 and 2000 iterations and see what works
>> best (it depends on how much data you have).
>>
>>
>> If you need to recognize nothing but handwritten numbers, you can also
>> look for github projects (not related to tesseract) about "MNIST"
>> handwritten numbers recognition with pre-trained models.
>>
>>
>> Bye
>>
>> Lorenzo
>>
>>
>> Il giorno lun 4 feb 2019 alle ore 08:34  ha scritto:
>>
>>> I am a beginner for OCR training. Can anyone explain how to use Ocr-d
>>> train briefly?
>>>
>>> I have Tesseract and Leptonica library installed in Cygwin
>>>
>>> tesseract 4.0.0
>>>  leptonica-1.77.0
>>>   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 :
>>> libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>>>  Found AVX2
>>>  Found AVX
>>>  Found SSE
>>>
>>> I want to train handwritten digits, because it is not detecting
>>> correctly by default traineddata. I have searched group and found no
>>> detailed instructions. I used Opencv and  python tesseract combination to
>>> achieve OCR of printed text and came to linux for handwritten digits
>>> training purpose. Kindly provide step by step instructions, it may help
>>> others also. I have attached the sample images which requires training.
>>> Thanks in advance
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/48ce49cc-6ade-4ebd-a1a6-5e382b033a95%40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/4b4745ff-7bba-4982-8ced-6df1d03a4590%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CABtjQ9KsD5qH%3D1peyf3dyx3PX023pqzrVR37yE-MefFThZUtXQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Ocr-d train - Tesseract 4.0 Training

2019-02-06 Thread sarathgis93
Thanks for your response, Since these are handwritten digits I don't have 
font data and what I'm having is cropped image blocks and I prepared some 
.gt.txt files. Is it possible to do lstm training without font data?

On Tuesday, February 5, 2019 at 1:17:27 AM UTC+5:30, Lorenzo Blz wrote:
>
>
> To use ocrd you need to prepare image files and txt files with the same 
> name but different extension.
> For example:
>
> sample1.png
> sample1.gt.txt
>
> The gt.txt is a simple text file containing the correct text, 145, for 
> example.
>
> The images must be cropped with no border or just a couple of pixels. Text 
> height should be about 30/40px. Try different options to see what works 
> best.
>
> To recognize numbers ONLY you also need to replaced the line:
>
>merge_unicharsets $(TESSDATA)/$(CONTINUE_FROM).lstm-unicharset 
> $(TRAIN)/my.unicharset  "$@"
>
> with:
>
>cp "$(TRAIN)/my.unicharset" "data/unicharset"
>
> in the makefile (see 
> https://groups.google.com/forum/#!searchin/tesseract-ocr/l.bolzani%7Csort:date/tesseract-ocr/be4-rjvY2tQ/32evtMHlAQAJ
>  
> )
>
> Then follow the instructions on the ocrd site.
>
> You can try 100, 250, 500, 1000 and 2000 iterations and see what works 
> best (it depends on how much data you have).
>
>
> If you need to recognize nothing but handwritten numbers, you can also 
> look for github projects (not related to tesseract) about "MNIST" 
> handwritten numbers recognition with pre-trained models.
>
>
> Bye
>
> Lorenzo
>
>
> Il giorno lun 4 feb 2019 alle ore 08:34 > 
> ha scritto:
>
>> I am a beginner for OCR training. Can anyone explain how to use Ocr-d 
>> train briefly?
>>
>> I have Tesseract and Leptonica library installed in Cygwin
>>
>> tesseract 4.0.0
>>  leptonica-1.77.0
>>   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : 
>> libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>>  Found AVX2
>>  Found AVX
>>  Found SSE
>>
>> I want to train handwritten digits, because it is not detecting correctly 
>> by default traineddata. I have searched group and found no detailed 
>> instructions. I used Opencv and  python tesseract combination to achieve 
>> OCR of printed text and came to linux for handwritten digits training 
>> purpose. Kindly provide step by step instructions, it may help others also. 
>> I have attached the sample images which requires training. Thanks in advance
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/48ce49cc-6ade-4ebd-a1a6-5e382b033a95%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4b4745ff-7bba-4982-8ced-6df1d03a4590%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Coordinates of the Text on the Mobile screen.

2019-02-06 Thread Quan Nguyen
There are several examples of getting word coordinates in Tess4J's unit 
tests.

On Tuesday, February 5, 2019 at 12:21:38 AM UTC-6, Rakesh Kumar wrote:
>
> Hi,
>
>  
>
>  
>
> Recently i have success using Tesseract-ocr in converting PNG file into 
> Text.
>
>  
>
> Scenario: I am taking screenshot(PNG) of the Mobile app and using 
> Tesseract for converting PNG file into Text. 
>
>  
>
> Question: When i convert PNG file into Text, can i also get 
> coordinates(X,Y)  of the certain text element on the mobile screen?
>
>  
>
> Example: Upon Conversion of PNG file into Text, text shows like this "Help 
> people interested in this repository understand your project by adding a 
> README."
>
>  
>
> In the above Example can i get coordinate(X,Y) of the Text element "
> *understand*"  ?
>
>  
>
> *This is my Project in git:*
>
>  
>
> https://github.com/rkandanuru/Tess4J.git
>
> Regards,
>
> Rakesh 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5e0890d0-91a8-42d3-94a0-cf279eadf7e9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Coordinates of the Text on the Mobile screen.

2019-02-06 Thread Rakesh Kumar
Can you please let me know what to check/modify in HOCR, TSV outputs &
 resultiterator api? Where can i find these assets?

On Wed, Feb 6, 2019 at 2:46 PM Rakesh Kumar 
wrote:

> Hi Sree, Thanks for the Reply! I don't find HOCR and TSV outputs or
> resultiterator api in my Project below. Can you please let me know where
> can i find it?
> https://github.com/rkandanuru/Tess4J.git
>
> On Wed, Feb 6, 2019 at 6:57 AM Shree Devi Kumar 
> wrote:
>
>> Check the HOCR and TSV outputs or resultiterator api at word level.
>>
>> On Wed, 6 Feb 2019, 17:21 Rakesh Kumar >
>>> Can any one please look into this?
>>>
>>> On Tue, Feb 5, 2019 at 1:21 AM Rakesh Kumar 
>>> wrote:
>>>
 Hi,





 Recently i have success using Tesseract-ocr in converting PNG file into
 Text.



 Scenario: I am taking screenshot(PNG) of the Mobile app and using
 Tesseract for converting PNG file into Text.



 Question: When i convert PNG file into Text, can i also get
 coordinates(X,Y)  of the certain text element on the mobile screen?



 Example: Upon Conversion of PNG file into Text, text shows like this "Help
 people interested in this repository understand your project by adding a
 README."



 In the above Example can i get coordinate(X,Y) of the Text element "
 *understand*"  ?



 *This is my Project in git:*



 https://github.com/rkandanuru/Tess4J.git

 Regards,

 Rakesh

 --
 You received this message because you are subscribed to the Google
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to tesseract-ocr+unsubscr...@googlegroups.com.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/tesseract-ocr/655bceaa-0efa-45b2-9d28-96f76067b5b1%40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAMQGaa%2B7tn-uGfBB3h71ebdM1Yy6m8AJyrR0qyjVW3MaXVz-AA%40mail.gmail.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUKvt67PzeUdqJui6qHf-Fb8e5o%3Dc1%3DRbBEq%2BaTqiJLSA%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMQGaaJH%3DrJuUDmN5yYisyKxZcq1N1Wy8zq1tkkL882nhffPzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Coordinates of the Text on the Mobile screen.

2019-02-06 Thread Rakesh Kumar
Hi Sree, Thanks for the Reply! I don't find HOCR and TSV outputs or
resultiterator api in my Project below. Can you please let me know where
can i find it?
https://github.com/rkandanuru/Tess4J.git

On Wed, Feb 6, 2019 at 6:57 AM Shree Devi Kumar 
wrote:

> Check the HOCR and TSV outputs or resultiterator api at word level.
>
> On Wed, 6 Feb 2019, 17:21 Rakesh Kumar 
>> Can any one please look into this?
>>
>> On Tue, Feb 5, 2019 at 1:21 AM Rakesh Kumar 
>> wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>>
>>>
>>> Recently i have success using Tesseract-ocr in converting PNG file into
>>> Text.
>>>
>>>
>>>
>>> Scenario: I am taking screenshot(PNG) of the Mobile app and using
>>> Tesseract for converting PNG file into Text.
>>>
>>>
>>>
>>> Question: When i convert PNG file into Text, can i also get
>>> coordinates(X,Y)  of the certain text element on the mobile screen?
>>>
>>>
>>>
>>> Example: Upon Conversion of PNG file into Text, text shows like this "Help
>>> people interested in this repository understand your project by adding a
>>> README."
>>>
>>>
>>>
>>> In the above Example can i get coordinate(X,Y) of the Text element "
>>> *understand*"  ?
>>>
>>>
>>>
>>> *This is my Project in git:*
>>>
>>>
>>>
>>> https://github.com/rkandanuru/Tess4J.git
>>>
>>> Regards,
>>>
>>> Rakesh
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/655bceaa-0efa-45b2-9d28-96f76067b5b1%40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAMQGaa%2B7tn-uGfBB3h71ebdM1Yy6m8AJyrR0qyjVW3MaXVz-AA%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUKvt67PzeUdqJui6qHf-Fb8e5o%3Dc1%3DRbBEq%2BaTqiJLSA%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMQGaaL2Md6FFR7bTQF8d4Ah76L%3DpwKmGS3rwB0%2BdGDRb-gHuw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Coordinates of the Text on the Mobile screen.

2019-02-06 Thread Shree Devi Kumar
Check the HOCR and TSV outputs or resultiterator api at word level.

On Wed, 6 Feb 2019, 17:21 Rakesh Kumar  Can any one please look into this?
>
> On Tue, Feb 5, 2019 at 1:21 AM Rakesh Kumar 
> wrote:
>
>> Hi,
>>
>>
>>
>>
>>
>> Recently i have success using Tesseract-ocr in converting PNG file into
>> Text.
>>
>>
>>
>> Scenario: I am taking screenshot(PNG) of the Mobile app and using
>> Tesseract for converting PNG file into Text.
>>
>>
>>
>> Question: When i convert PNG file into Text, can i also get
>> coordinates(X,Y)  of the certain text element on the mobile screen?
>>
>>
>>
>> Example: Upon Conversion of PNG file into Text, text shows like this "Help
>> people interested in this repository understand your project by adding a
>> README."
>>
>>
>>
>> In the above Example can i get coordinate(X,Y) of the Text element "
>> *understand*"  ?
>>
>>
>>
>> *This is my Project in git:*
>>
>>
>>
>> https://github.com/rkandanuru/Tess4J.git
>>
>> Regards,
>>
>> Rakesh
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/655bceaa-0efa-45b2-9d28-96f76067b5b1%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAMQGaa%2B7tn-uGfBB3h71ebdM1Yy6m8AJyrR0qyjVW3MaXVz-AA%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUKvt67PzeUdqJui6qHf-Fb8e5o%3Dc1%3DRbBEq%2BaTqiJLSA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Coordinates of the Text on the Mobile screen.

2019-02-06 Thread Rakesh Kumar
Can any one please look into this?

On Tue, Feb 5, 2019 at 1:21 AM Rakesh Kumar 
wrote:

> Hi,
>
>
>
>
>
> Recently i have success using Tesseract-ocr in converting PNG file into
> Text.
>
>
>
> Scenario: I am taking screenshot(PNG) of the Mobile app and using
> Tesseract for converting PNG file into Text.
>
>
>
> Question: When i convert PNG file into Text, can i also get
> coordinates(X,Y)  of the certain text element on the mobile screen?
>
>
>
> Example: Upon Conversion of PNG file into Text, text shows like this "Help
> people interested in this repository understand your project by adding a
> README."
>
>
>
> In the above Example can i get coordinate(X,Y) of the Text element "
> *understand*"  ?
>
>
>
> *This is my Project in git:*
>
>
>
> https://github.com/rkandanuru/Tess4J.git
>
> Regards,
>
> Rakesh
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/655bceaa-0efa-45b2-9d28-96f76067b5b1%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMQGaa%2B7tn-uGfBB3h71ebdM1Yy6m8AJyrR0qyjVW3MaXVz-AA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Can i include a unicharambigs file in LSTM training?

2019-02-06 Thread Kristóf Horváth
Then what kind of training do you recommend?

2019. február 6., szerda 9:34:09 UTC+1 időpontban shree a következőt írta:
>
>
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files
>
> *NOTE* Tesseract 4.00 will now run happily with a traineddata file that 
> contains *just* lang.lstm, lang.lstm-unicharset and lang.lstm-recoder. 
> The lstm-*-dawgs are optional, and *none of the other components are 
> required or used with OEM_LSTM_ONLY as the OCR engine mode.* No bigrams, 
> unichar ambigs or any of the other components are needed or even have any 
> effect if present. The only other component that does anything is the 
> lang.config, which can affect layout analysis, and sub-languages. 
>
> On Wed, Feb 6, 2019 at 1:59 PM Kristóf Horváth  > wrote:
>
>> Currently my idea is to filter out certain characters. *:;&!*
>> Those 4 characters cant be dealt with in our process after OCR done, but 
>> we can filter . as a character so if i make a unicharambigs file about 
>> replacing pairs of these characters, then technically at end result it 
>> should have less "extra characters" that would slow down the process of 
>> finding actual things in the text.
>>
>> My example of unicharambigs:
>> v1
>> 2 :;  1 . 1
>> 2 :& 1 . 1
>> 2 :!  1 . 1
>>
>>  and so on.
>>
>>
>>
>> My question: Is my logic sound enough to try it? Can i include a 
>> unicharambigs file to LSTM training?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/a3efaaed-f2aa-4170-bc93-0f5e0ab8c52e%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
>
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5067702b-4b8f-483a-9b31-e8bac9032821%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Can i include a unicharambigs file in LSTM training?

2019-02-06 Thread Shree Devi Kumar
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files

*NOTE* Tesseract 4.00 will now run happily with a traineddata file that
contains *just* lang.lstm, lang.lstm-unicharset and lang.lstm-recoder. The
lstm-*-dawgs are optional, and *none of the other components are required
or used with OEM_LSTM_ONLY as the OCR engine mode.* No bigrams, unichar
ambigs or any of the other components are needed or even have any effect if
present. The only other component that does anything is the lang.config,
which can affect layout analysis, and sub-languages.

On Wed, Feb 6, 2019 at 1:59 PM Kristóf Horváth  wrote:

> Currently my idea is to filter out certain characters. *:;&!*
> Those 4 characters cant be dealt with in our process after OCR done, but
> we can filter . as a character so if i make a unicharambigs file about
> replacing pairs of these characters, then technically at end result it
> should have less "extra characters" that would slow down the process of
> finding actual things in the text.
>
> My example of unicharambigs:
> v1
> 2 :;  1 . 1
> 2 :& 1 . 1
> 2 :!  1 . 1
>
>  and so on.
>
>
>
> My question: Is my logic sound enough to try it? Can i include a
> unicharambigs file to LSTM training?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/a3efaaed-f2aa-4170-bc93-0f5e0ab8c52e%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 


भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduViKUQ2v%3DO1AS01ZrrN3kbf0%2B-HzY3KyvSvKuhdz-1EmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Can i include a unicharambigs file in LSTM training?

2019-02-06 Thread Kristóf Horváth
Currently my idea is to filter out certain characters. *:;&!*
Those 4 characters cant be dealt with in our process after OCR done, but we 
can filter . as a character so if i make a unicharambigs file about 
replacing pairs of these characters, then technically at end result it 
should have less "extra characters" that would slow down the process of 
finding actual things in the text.

My example of unicharambigs:
v1
2 :;  1 . 1
2 :& 1 . 1
2 :!  1 . 1

 and so on.



My question: Is my logic sound enough to try it? Can i include a 
unicharambigs file to LSTM training?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a3efaaed-f2aa-4170-bc93-0f5e0ab8c52e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.