Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread ShreeDevi Kumar
See
https://github.com/tesseract-ocr/tesseract/commit/5deebe6c279f70215935c1f86baa7e7016c7f2a7

Ray's comment for commit

Moved cube aside without deleting it.



- excuse the brevity, sent from mobile

On 22-Mar-2017 10:07 PM, "THintz"  wrote:

> I'm sure I cloned master on 3/20/2017 3:55.   publictypes.h defines this:
>
> enum OcrEngineMode {
>   OEM_TESSERACT_ONLY,   // Run Tesseract only - fastest
>   OEM_LSTM_ONLY,// Run just the LSTM line recognizer.
>   OEM_TESSERACT_LSTM_COMBINED,  // Run the LSTM recognizer, but allow
> fallback
> // to Tesseract when things get difficult.
>   OEM_DEFAULT,  // Specify this mode when calling init_*(),
> // to indicate that any of the above modes
> // should be automatically inferred from
> the
> // variables in the language-specific
> config,
> // command-line configs, or if not
> specified
> // in any of the above should be set to the
> // default OEM_TESSERACT_ONLY.
>   OEM_CUBE_ONLY,// Run Cube only - better accuracy, but
> slower
>   OEM_TESSERACT_CUBE_COMBINED,  // Run both and combine results - best
> accuracy
> };
>
>
>
> On Wednesday, March 22, 2017 at 12:04:24 PM UTC-4, shree wrote:
>>
>> Sorry, mentioned incorrect code for LSTM
>>
>> OCR Engine modes:
>>   0Original Tesseract only.
>>   1Neural nets LSTM only.
>>   2Tesseract + LSTM.
>>   3Default, based on what is available
>>
>>
>> - excuse the brevity, sent from mobile
>>
>> On 22-Mar-2017 9:02 PM, "ShreeDevi Kumar"  wrote:
>>
>>> The initial 4.0alpha tag from November has cube in it. It was deleted
>>> later and is no longer in master.
>>>
>>> In fact, the OEM code for LSTM was originally 4 and now is 2.
>>>
>>> Shouldn't semantic versioning require tagging at major updates?
>>>
>>> - excuse the brevity, sent from mobile
>>>
>>> On 22-Mar-2017 8:58 PM, "universal reseller"  wrote:
>>>
 ​how did you used cube engine on tesse 4 !?

 --
 You received this message because you are subscribed to the Google
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit https://groups.google.com/d/ms
 gid/tesseract-ocr/CAC9ebrorORNrpApquscKiPf2Qbguc85qY1SJve6he
 u2j4Dithg%40mail.gmail.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/0a0bb2e9-cc85-464c-8801-c4614edbfd05%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV-4W8Eh7Z%3D87ZkV8MZikqPzhsZtbO86u7ZR7Toi7cFUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread THintz
Enabling OpenMP in Visual Studio for the build roughly doubled LSTM 
performance.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8aa3cf71-ee94-4ded-a021-51f0ffe3ed66%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: training font

2017-03-22 Thread Saurabh Srivastav
you can train it for single font.

On Sunday, March 19, 2017 at 1:23:50 PM UTC+5:30, Ava Nimaee wrote:
>
> hi , i need your help.
> i want know that in tesseract-ocr for persian , we have a train for each 
> font or we have a train for all fonts ?thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bc10abbc-cd96-4d7c-8350-9ab3deb531ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] train tesseract OCR 4.0

2017-03-22 Thread Saurabh Srivastav
Thank you shree for your valuable reply. But now i have created box files 
for a particuler image and trained it..but still i am missing something, 
may you please help me what i have to do after creating box file for that 
image and make tesseract to read the characters from that image.

thanks and regards.

On Friday, March 3, 2017 at 12:53:31 PM UTC+5:30, shree wrote:
>
> screenshot of warning  means that your image does not have resolution 
> info. Your OCR output file should have been created.
>
> Training 4.0 is not easy. Please see 
> https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Fri, Mar 3, 2017 at 12:17 PM, Saurabh Srivastav  > wrote:
>
>> how to train tesseract 4.0. Please help me..
>>
>> thanks,
>> Saurabh Srivastav
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/f1782fd1-97a1-40db-8ba0-f003052f39ae%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/14d1eb0f-7881-4d71-82ba-25e85f8867fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread THintz
I noticed OpenMP support isn't enabled by default.  I'll get new timings.

What do modes 4 & 5 do if Cube is no longer present?  They produced good 
output, and performance was the best. 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/58dc03b7-b17d-45bf-aecf-399e8b5ef424%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread THintz
I'm sure I cloned master on 3/20/2017 3:55.   publictypes.h defines this:

enum OcrEngineMode {
  OEM_TESSERACT_ONLY,   // Run Tesseract only - fastest
  OEM_LSTM_ONLY,// Run just the LSTM line recognizer.
  OEM_TESSERACT_LSTM_COMBINED,  // Run the LSTM recognizer, but allow 
fallback
// to Tesseract when things get difficult.
  OEM_DEFAULT,  // Specify this mode when calling init_*(),
// to indicate that any of the above modes
// should be automatically inferred from the
// variables in the language-specific 
config,
// command-line configs, or if not specified
// in any of the above should be set to the
// default OEM_TESSERACT_ONLY.
  OEM_CUBE_ONLY,// Run Cube only - better accuracy, but 
slower
  OEM_TESSERACT_CUBE_COMBINED,  // Run both and combine results - best 
accuracy
};



On Wednesday, March 22, 2017 at 12:04:24 PM UTC-4, shree wrote:
>
> Sorry, mentioned incorrect code for LSTM
>
> OCR Engine modes:
>   0Original Tesseract only.
>   1Neural nets LSTM only.
>   2Tesseract + LSTM.
>   3Default, based on what is available
>
>
> - excuse the brevity, sent from mobile
>
> On 22-Mar-2017 9:02 PM, "ShreeDevi Kumar"  > wrote:
>
>> The initial 4.0alpha tag from November has cube in it. It was deleted 
>> later and is no longer in master.
>>
>> In fact, the OEM code for LSTM was originally 4 and now is 2.
>>
>> Shouldn't semantic versioning require tagging at major updates?
>>
>> - excuse the brevity, sent from mobile
>>
>> On 22-Mar-2017 8:58 PM, "universal reseller" > > wrote:
>>
>>> ​how did you used cube engine on tesse 4 !?
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com .
>>> To post to this group, send email to tesser...@googlegroups.com 
>>> .
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAC9ebrorORNrpApquscKiPf2Qbguc85qY1SJve6heu2j4Dithg%40mail.gmail.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0a0bb2e9-cc85-464c-8801-c4614edbfd05%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread ShreeDevi Kumar
Sorry, mentioned incorrect code for LSTM

OCR Engine modes:
  0Original Tesseract only.
  1Neural nets LSTM only.
  2Tesseract + LSTM.
  3Default, based on what is available


- excuse the brevity, sent from mobile

On 22-Mar-2017 9:02 PM, "ShreeDevi Kumar"  wrote:

> The initial 4.0alpha tag from November has cube in it. It was deleted
> later and is no longer in master.
>
> In fact, the OEM code for LSTM was originally 4 and now is 2.
>
> Shouldn't semantic versioning require tagging at major updates?
>
> - excuse the brevity, sent from mobile
>
> On 22-Mar-2017 8:58 PM, "universal reseller" 
> wrote:
>
>> ​how did you used cube engine on tesse 4 !?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/CAC9ebrorORNrpApquscKiPf2Qbguc85qY1SJve6he
>> u2j4Dithg%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXrR26F5aXCU6UQdW2g2-bsEQW0Sb2-yPn-yA85tEP9PQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread ShreeDevi Kumar
The initial 4.0alpha tag from November has cube in it. It was deleted later
and is no longer in master.

In fact, the OEM code for LSTM was originally 4 and now is 2.

Shouldn't semantic versioning require tagging at major updates?

- excuse the brevity, sent from mobile

On 22-Mar-2017 8:58 PM, "universal reseller"  wrote:

> ​how did you used cube engine on tesse 4 !?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/CAC9ebrorORNrpApquscKiPf2Qbguc
> 85qY1SJve6heu2j4Dithg%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVLfmtk%3DgmuX1DKD%3DYDbvtcF8xFC%2B5i0GyCP2Eiqj5v9Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread ShreeDevi Kumar
See
https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance



- excuse the brevity, sent from mobile

On 22-Mar-2017 8:58 PM, "universal reseller"  wrote:

> ​how did you used cube engine on tesse 4 !?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/CAC9ebrorORNrpApquscKiPf2Qbguc
> 85qY1SJve6heu2j4Dithg%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVSVfcic7MkV9xsSC6cmk%2BTfLrLu%2BduESxemYVGpsOyYQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread universal reseller
​how did you used cube engine on tesse 4 !?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAC9ebrorORNrpApquscKiPf2Qbguc85qY1SJve6heu2j4Dithg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: FOSS Project Proposal: tesseract-cloud

2017-03-22 Thread Derek
That's a great idea -- I don't have spare time for new projects at the 
moment, but I wonder if something like OpenOCR might be useful as a 
starting point for an effort like this: https://github.com/tleyden/open-ocr

On Tuesday, March 21, 2017 at 4:03:52 PM UTC-4, Rich Jones wrote:
>
> Hello, all!
>
> I'm currently talking with a group of MuckRock users about automatically 
> OCR'ing a very large set (tens of millions) of CIA documents.
>
> It looks like this will take many months to scan on a single machine, but 
> I think it could happen in far less time if done in parallel on AWS Lambda 
> (or similar) or on an elastic cluster.
>
> It will take a little bit of work to design and build this architecture 
> (two architectures, in fact, one optimized for speed and one optimized for 
> cost), so I think it would be nice if we could build out this system in a 
> way that would benefit the larger community. Therefore, I'd like to float 
> the proposal that we start a new Free and Open Source software project for 
> tools, templates and guides to build queue-based elastic and server-less 
> Tesseract systems which are capable of quickly and affordably scanning 
> millions of documents in the cloud.
>
> Would anybody on this list be interested in working on something like this?
>
> Even more specifically - since Google is maintaining ownership of the 
> Tesseract project, and Google also owns the Google Cloud Platform, would 
> Google be willing to devote some resources into sponsoring the creation of 
> this project, if it could be designed to run on the Google Cloud (GCE/GCF) 
> and using Google technologies (k8s)? If not, does anybody know of any other 
> organizations which would be interested in throwing some resources at this?
>
> It's just an idea, but it's something that I'd like to work on if the 
> resources are available that I think would have a very large impact for a 
> number of different communities. 
>
> Thanks for your consideration and feedback,
> Rich Jones
> https://github.com/Miserlou
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/fa15e8ac-5570-4537-9511-1abb3375d4b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread THintz
LSTM recognize via TessBaseAPIRecognize() gives me the following 
performance numbers for the same bi-tonal image.  The image is read and 
passed as a bitmap.  These numbers are only for the TessBaseAPIRecognize() 
call portion of the process.

The question is this: should I have expected LSTM Only mode to be faster 
than Tesseract and Cube mode?

This is an x64 Windows build of Tesseract with Leptonica 1.74.1. 
 eng.traindata was used from the 4.0 download.

RECOGNIZE TIME: 00:00:11.3292062 | ENGINE: LstmOnly

RECOGNIZE TIME: 00:00:05.9569210 | ENGINE: TesseractAndCube

RECOGNIZE TIME: 00:00:10.6854010 | ENGINE: TesseractAndLstm

RECOGNIZE TIME: 00:00:10.6725257 | ENGINE: Default



-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bf2efcb0-85ee-4b75-9a23-7a6dfbd119c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.