from:"ShreeDevi Kumar"

Re: [tesseract-ocr] mftraining Segmentation fault error

2016-11-04 Thread ShreeDevi Kumar

Probably better to post on tesseract-dev, though there is no guarantee that
the developers will reply.

On 4 Nov 2016 3:07 p.m., "Tom De Costere" <neosniperkil...@gmail.com> wrote:

> Just to be sure, are the developers watching this Google Group or should I
> make a topic under the "tesseract-dev" group?
>
> FYI: we've breached the 5k number of fonts this morning
>
> I'm thinking of notifying the users that they should only create box files
> for documents containing terrible handwriting.
> Since I'm seeing quite good detection rates on new documents, even when
> they are not trained yet.
>
> Op donderdag 3 november 2016 17:53:51 UTC+1 schreef shree:
>>
>> Please see https://github.com/tesseract-ocr/tesseract/blob/master/train
>> ing/language-specific.sh
>>
>> The max no of fonts for each language is not very large.
>>
>> I am not even sure whether increasing the number of fonts beyond a limit
>> will improve the recognition.
>>
>> I think it is unlikely that tesseract can handle thousands of box/tif
>> pairs that you are planning.
>>
>> I hope one of the developers will reply with a more definitive response.
>>
>> On 3 Nov 2016 2:21 p.m., "Tom De Costere" <neosnip...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> Thank you for your responses!
>>>
>>> Let me clarify the situation here on which training is performed, so you
>>> understand why we have 130+ tr files.
>>>
>>>
>>> We have fill-in forms for our customers, which they have to hand over to
>>> our workers, in order to specify when and what our worker have performed at
>>> their house. On these forms there are fill-in boxes, like a date and name
>>> and work hours.
>>>
>>> Now the major time waste at our company is the manual parsing of the
>>> documents into our electronic bookkeeping application.
>>> The current situation is: our workforce have to manually type over the
>>> filled in values from the papers into the application.
>>> As you can guess, this is a very long and time consuming task, which
>>> nobody loves to do every day.
>>>
>>> Since there are, at the moment, almost no other OCR technologies which
>>> give a good recognition rate for handwriting, we are trying Tesseract to
>>> improve this job.
>>>
>>>
>>> Our currently automated training algorithm uses these fill-in forms as
>>> basis for the learning of Tesseract.
>>> We created a .NET program for generating the box files and correcting
>>> the OCR values, which some of our workers use at the moment.
>>> The corrected box files are then sent to our OCR server (running
>>> Tesseract), which trains the language file with the new inputs.
>>>
>>> So in order to improve the detection percentage, we are creating one big
>>> language file for our entire customerbase, with unique fonts for each
>>> customer.
>>> Since every customers has his/her unique handwriting.
>>>
>>> At the moment we have generated over 1000 box files for around 130
>>> customers (130 from 3000+ customers).
>>>
>>>
>>> So to give an example:
>>>
>>> ncorp.traineddate consists of fonts:
>>> - ocrB (standard printer font)
>>> - customerA (handwriting for customer A)
>>> - customerB (handwriting for customer B)
>>> - customerC (handwriting for customer C)
>>> - ...
>>>
>>>
>>> This is why we have over 130 TR files at the moment, and the number is
>>> steadily rising every hour.
>>>
>>>
>>> Now it would be ideal if Tesseract had a re-train function, instead of
>>> training the whole file again and again.
>>> So that we would simply inject a new font for a new customer when it's
>>> needed.
>>>
>>> Correct me if I'm wrong, but as far as I know and as far as I have found
>>> on the internet, Tesseract doesn't have a re-train function which uses an
>>> existing traineddata file as input. And then outputs an improved version of
>>> this traineddata file.
>>>
>>>
>>> *@Shree*
>>> @Rkvsraman
>>>
>>> If there is a limit for Tesseract training, why are they supplying a
>>> font_properties file with around 4000 fonts then?
>>> Or is this purely to be able to train using these fonts?
>>>
>>> Might there be another way to use the training for such a large amount
>>> of fonts?
>>> Can training the

Re: [tesseract-ocr] Failure to recognize columns

2016-10-13 Thread ShreeDevi Kumar

Try psm 6, also 11, 12

https://github.com/tesseract-ocr/tesseract/issues/434

On 13 Oct 2016 1:13 p.m., "fuzzy7k"  wrote:

> I tried psm 0-3
>
> On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote:
>>
>> Which page segmentation mode (psm) did you try?
>>
>> On 12 Oct 2016 11:21 p.m., "fuzzy7k"  wrote:
>>
>>> I have scanned some index pages that I would like to ocr for rapid
>>> searching. I am using tesseract from the command line. The problem is that
>>> tesseract ignores the whitespace between columns and merges everything
>>> together, essentially fragmenting the contents. Using some debug output I
>>> see that no "columns" are detected. Probably more important is that three
>>> "blocks" are detected, one around the first and last line, and one
>>> encompassing everything in between. Is there a way to train block
>>> detection, or some parameters that I can tweak to optimize this?
>>>
>>> I have attached the image merely as an abstract representation of the
>>> text layout to show the types of columns I am dealing with. Ideally, it
>>> would also be nice to know if tab stops can be trained and used to oneline
>>> each individual topic, which I could do postprocess if I could get tabstops
>>> printed.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/5b4800f9-cead-4959-9260-52e98ee596b7%40goo
>>> glegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/6a866b2d-e18b-4ef2-89ab-5e4627cd3d06%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU5LPcbcyiW4D-z5_uSY%2BLVUeRNTGniwn1%2BS26YLTPmGw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Failure to recognize columns

2016-10-12 Thread ShreeDevi Kumar

Which page segmentation mode (psm) did you try?

On 12 Oct 2016 11:21 p.m., "fuzzy7k"  wrote:

> I have scanned some index pages that I would like to ocr for rapid
> searching. I am using tesseract from the command line. The problem is that
> tesseract ignores the whitespace between columns and merges everything
> together, essentially fragmenting the contents. Using some debug output I
> see that no "columns" are detected. Probably more important is that three
> "blocks" are detected, one around the first and last line, and one
> encompassing everything in between. Is there a way to train block
> detection, or some parameters that I can tweak to optimize this?
>
> I have attached the image merely as an abstract representation of the text
> layout to show the types of columns I am dealing with. Ideally, it would
> also be nice to know if tab stops can be trained and used to oneline each
> individual topic, which I could do postprocess if I could get tabstops
> printed.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/5b4800f9-cead-4959-9260-52e98ee596b7%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVxWb22rBuReArRcOKkur1Oxd-tWfs%3D%2BTOgHoyDzmvkzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Tesseract 4.0: VGSLSpecs

2016-12-16 Thread ShreeDevi Kumar

+ Ray Smith

On 16-Dec-2016 10:58 PM, "Kay-Michael Würzner"  wrote:

> Yes,  I did and in principle everything works like a charm which is great.
> What I want to accomplish now is some understanding: Why do I have to set a
> documented parameter in some undocumented way or to be more precise set
> this parameter to a value which conflicts with the documentation to make
> the whole process work?
>
> Cheers,
> Kay
>
> On Friday, December 16, 2016 at 5:36:02 PM UTC+1, shree wrote:
>>
>> Did you try out the commands as per the LSTM training tutorial?
>>
>> On 16-Dec-2016 8:31 PM, "Kay-Michael Würzner"  wrote:
>>
>>> Dear @,
>>>
>>> I played around with training the new LSTM mode. According to the
>>> documentation of the network specification (
>>> https://github.com/tesseract-ocr/tesseract/wiki/VGSLSpecs), the last
>>> number in the first tupel called 'depth' corresponds to the type of input
>>> (i.e. 1 ... grayscale, 3 ... color). However one o the given examples uses
>>> '48' in this position:
>>>
>>> [1,1,0,48 Lbx256 O1c105]
>>>
>>> Using an assumingly corrected specification
>>>
>>> [1,48,0,1 Lbx256 O1c105]
>>>
>>> causes serious runtime issues namely each iteration takes several
>>> minutes and huge amounts of memory are adressed. Any hints on what I am
>>> doing wrong here?
>>>
>>> Many thanks in advance,
>>> Kay
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/167ff50a-2fbb-463c-b637-c051b2e9da82%40goo
>>> glegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/2206e63d-0675-4630-8251-6ad5bcc8a9c9%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVK4EKyMQUzyv3KQ5Spzn8cfZ9vMM%3DCyou-ddkOn79PMA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Tesseract 4.0: VGSLSpecs

2016-12-16 Thread ShreeDevi Kumar

Did you try out the commands as per the LSTM training tutorial?

On 16-Dec-2016 8:31 PM, "Kay-Michael Würzner"  wrote:

> Dear @,
>
> I played around with training the new LSTM mode. According to the
> documentation of the network specification (https://github.com/tesseract-
> ocr/tesseract/wiki/VGSLSpecs), the last number in the first tupel called
> 'depth' corresponds to the type of input (i.e. 1 ... grayscale, 3 ...
> color). However one o the given examples uses '48' in this position:
>
> [1,1,0,48 Lbx256 O1c105]
>
> Using an assumingly corrected specification
>
> [1,48,0,1 Lbx256 O1c105]
>
> causes serious runtime issues namely each iteration takes several minutes
> and huge amounts of memory are adressed. Any hints on what I am doing wrong
> here?
>
> Many thanks in advance,
> Kay
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/167ff50a-2fbb-463c-b637-c051b2e9da82%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVCiZh-sX--7xFJOxn6v11fpddoFqHea-htoHGFCnbs6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: pdf -> searchable PDF

2017-01-13 Thread ShreeDevi Kumar

Please see https://github.com/tesseract-ocr/tesseract/issues/83 and other
PDF related issues in GitHub repo with similar discussion.

- excuse the brevity, sent from mobile

On 13-Jan-2017 10:15 PM, "James R Barlow"  wrote:

> Tesseract cannot rasterize PDFs. It is fairly straightforward to write a
> PDF like does, but very complex to rasterize one.
>
> Programs like OCRmyPDF (which I develop) use Ghostscript, Tesseract and
> other tools to handle PDF to searchable PDF conversion.
>
>
> On Tuesday, January 10, 2017 at 9:34:57 PM UTC-8, Andreas Steibl wrote:
>>
>> Hello
>>
>> I have a pdf (scanned) and now i make a searchable pdf from this
>> First i generate a black/white multipage tif, and with tesseract i can
>> make a searchable pdf.
>>
>> But is it somehow possible to integrate the original pdf images?
>> because the generated tif has not the same quality like the original
>> (maybe the scaned image is in color)
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/2dccb3d2-f45e-4f47-9d04-302814d7f4ce%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXrQVdZOoHAChVXDMQ1%2BDjDYV5zgRE6hWnAmq%2B-fSU4DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] LSTM training error after some iterations

2017-01-14 Thread ShreeDevi Kumar

Try without the following line.

--eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sat, Jan 14, 2017 at 3:47 AM,  wrote:

> I tried to train the English from scratch by following the tutorials in:
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
> However, the training will fail after some iterations with the message as
> follows:
>
> lt-lstmtraining: ../ccutil/genericvector.h:696: T&
> GenericVector::operator[](int) const [with T = char]: Assertion `index
> >= 0 && index < size_used_' failed.
>
> What is going on?
> How can I fix this?
> Many thanks.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/407f8799-9a2b-4659-9ca5-75909a2ff150%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWxTHFMXEYn5i_2WmCNUcby2M1RUnOHkaYHH74UCBNLhQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] LSTM training error after some iterations

2017-01-14 Thread ShreeDevi Kumar

Also note that

--debug_interval 100

invokes the visual debugger, you need Scrollview.jar to see that.

You can instead try

--debug_interval -1

if you want to see verbose text output for every iteration

or

--debug_interval 0

for a message every 100 iterations

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sat, Jan 14, 2017 at 6:14 PM, ShreeDevi Kumar <shreesh...@gmail.com>
wrote:

> Try without the following line.
>
> --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
>
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Sat, Jan 14, 2017 at 3:47 AM, <cheng...@huawei.com> wrote:
>
>> I tried to train the English from scratch by following the tutorials in:
>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>> However, the training will fail after some iterations with the message as
>> follows:
>>
>> lt-lstmtraining: ../ccutil/genericvector.h:696: T&
>> GenericVector::operator[](int) const [with T = char]: Assertion
>> `index >= 0 && index < size_used_' failed.
>>
>> What is going on?
>> How can I fix this?
>> Many thanks.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/407f8799-9a2b-4659-9ca5-75909a2ff150%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/407f8799-9a2b-4659-9ca5-75909a2ff150%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW%2BrPwAG8YE%2B6GFsenRdf9oGtQg0ceKJ%2BLP1LaLSiqN3w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Ground Truth from Box Files

2017-01-06 Thread ShreeDevi Kumar

Does anyone know of any utilities to convert a box file to ground truth
text file?

I am using tesstrain.sh which uses text2image for trying out LSTM training.
However, because unrenderable words are not included in the tifs, it is not
possible to use the training_text as ground truth.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVx1ESWquhykjG6qbhLHB6JC4qQz3qp1NdcuGMs3zSGBg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-06 Thread ShreeDevi Kumar

I have uploaded modified nor.traineddata at

https://github.com/Shreeshrii/tessdata4alpha/blob/master/nor.traineddata

See attached log and info file for commands used in training. It took about
9 hours on my pc - about 1700 iterations only and then my PC froze so I
rebooted and created the traineddata for norlayer0.853_1615.lstm i.e. 0.853
% character error rate at iteration number 1615.


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Jan 6, 2017 at 5:59 PM, ShreeDevi Kumar <shreesh...@gmail.com>
wrote:

> @Peter, Have you tried the 4.0.0alpha version yet?
>
> @Ludvig F. Aarstad - Add a layer training worked for adding 'Æ' - I will
> upload the new traineddata so that you can test. You will need 4.0.alpha
> version for testing.
>
> Here is couple of the training tifs and OCRed text.
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Fri, Jan 6, 2017 at 5:01 PM, Peter <pe...@peterkrantz.se> wrote:
>
>>
>>
>> Den torsdag 5 januari 2017 kl. 04:39:01 UTC+1 skrev shree:
>>>
>>> Ray is planning to retrain the languages for the new 4.0.0 version
>>> sometime in January. So it would be helpful if you could open an issue on
>>> https://github.com/tesseract-ocr/langdata/issues with this information.
>>>
>>
>> Is it possible to contribute training data for this effort? I realise
>> swedish will not be on top of the list but I think it would be easy to
>> involve some of the research community here in contributing training data
>> if it could improve the language model.
>>
>> /Peter
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com?utm_medium=email_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXOW8gDtXxKSmavVBocM7ErH3MMOcdZe9ehEYUUW0VNzQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
-
// Error rate at which to transition to stage 1.
const double kStageTransitionThreshold = 10.0;

// Appends  iteration learning_iteration()/training_iteration()/
// sample_iteration() to the log_msg.

 // Delta error is the fraction of timesteps with >0.5 error in the top choice
  // score. If zero, then the top choice characters are guaranteed correct,
  // even when there is residue in the RMS error.

  // Skip ratio measures the difference between sample_iteration_ and
  // training_iteration_, which reflects the number of unusable samples,
  // usually due to unencodable truth text, or the text not fitting in the
  // space for the output.

---
$ mkdir -p ~/tesstutorial/nor_layer
$ combine_tessdata -e ../tessdata/nor.traineddata \
>   ~/tesstutorial/nor_layer/nor.lstm
Extracting tessdata components from ../tessdata/nor.traineddata
Wrote /home/shree/tesstutorial/nor_layer/nor.lstm
$  lstmtraining -U ~/tesstutorial/nor/nor.unicharset \
>   --script_dir ../langdata  --debug_interval 0 \
>   --continue_from ~/tesstutorial/nor_layer/nor.lstm \
>   --append_index 5 --net_spec '[Lfx256 O1c105]' \
>   --model_output ~/tesstutorial/nor_layer/norlayer \
>   --train_listfile ~/tesstutorial/nor/nor.training_files.txt \
>   --max_iterations 5
Loaded file /home/shree/tesstutorial/nor_layer/nor.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from /home/shree/tesstutorial/nor_layer/nor.lstm
Other case É of é is not in unicharset
Other case Ö of ö is not in unicharset
Other case Ä of ä is not in unicharset
Appending a new network to an old one!!Setting unichar prop

Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-04 Thread ShreeDevi Kumar

Ray is planning to retrain the languages for the new 4.0.0 version sometime
in January. So it would be helpful if you could open an issue on
https://github.com/tesseract-ocr/langdata/issues with this information.

Also, if you can provide a sample representative Norwegian text including Æ,
I will try the finetune training procedure outlined by Ray in
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Jan 4, 2017 at 8:57 PM, Ludvig F Aarstad  wrote:

> If someone feels up to it, any chance of dumbing down the procedure for
> adding in a missing letter in the norwegian language? I am happy tondl the
> legwork, just need to understand the concept, and I don't quite understand
> it when reading the guides.
> An easy list containing the steps would do just fine.
>
> Something like:
> 1. Create an image of the letter to add
> 2. Update wordlist
> 3. etc etc
> 4. build something
> 5. upload to github
>
> Or am I simply totally off the track?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/27ab99d7-eab0-4b9f-9086-2ccb16292ac9%
> 40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVgb9QZJS-b2N%3Dkzd%2Bmo6WmdJSCinupiQ79MDadneC9uA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-05 Thread ShreeDevi Kumar

I will give it a try and let you know.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWwayDgdPrspK-guAx7Rpvk_FWc-4XRnV863pVHj7hRSg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Swedish language

2017-01-08 Thread ShreeDevi Kumar

Testing with tifs created from the training text, accuracy seems quite good
for Swedish using 4.0.0-alpha traineddata. Please see attached eval reports.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Jan 6, 2017 at 9:36 PM, ShreeDevi Kumar <shreesh...@gmail.com>
wrote:

> Peter,
>
> Please see https://github.com/tesseract-ocr/langdata/blob/master/swe/
> swe.training_text
>
> You can provide additional training text if some needed characters are
> missing in the above. I can do a test training with it.
>
> - excuse the brevity, sent from mobile
>
> On 06-Jan-2017 5:01 PM, "Peter" <pe...@peterkrantz.se> wrote:
>
>>
>>
>> Den torsdag 5 januari 2017 kl. 04:39:01 UTC+1 skrev shree:
>>>
>>> Ray is planning to retrain the languages for the new 4.0.0 version
>>> sometime in January. So it would be helpful if you could open an issue on
>>> https://github.com/tesseract-ocr/langdata/issues with this information.
>>>
>>
>> Is it possible to contribute training data for this effort? I realise
>> swedish will not be on top of the list but I think it would be easy to
>> involve some of the research community here in contributing training data
>> if it could improve the language model.
>>
>> /Peter
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUR6fRZYw5PsBo7ZnTFuON5p1Y1iQQZC0R5%2B%3DGo0Pd-dA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

General results

CER0.15

WER0.00

WER (order independent)0.00

Difference spotting

swe.training_text.txt

swe.Arial.txt

Show ”Det sin född SÖK don't exempel 2004 måste bör 28 var april nu svenska en i jag "Jag bör ... där 2006 PÅ by Ja, Hjälp att Wikipedia första din ju 50% är av och gärna inom - men FZ ”Jag 16 sig vilket t.ex. Ny väl de P4 /Lokal_Profil augusti €£$ vill gick vara 18 Reg.datum: sätt 24 vid I'm ca svenska finns har som 1998 kl. = alla andra från i 3 före Café 14 och på 1 Show barn och efter FÖR 2010 av av bort you kl. som inlägg och med del juni juli Kategori: blir ett 31 när man LÄNKAR [ bland lite ta mest Eazy$ of ©2000-2008, och därför för 0 över är den. 2008-10-24 Kontakta 25 du? om Jag även de några 2008, som »Läs till från är skall 10 så kommer Senaste produkter För AZ det jag var det vill #2 sin flera nu danskspråkiga, ytterligare inte Telefon: du kl. vi av typ ut. (SIX) upp jag senaste ser UPPLYSNING a AQHA of mellan vill min hjälp information finns mig över tecknet; mot Du dig här betyder € # gäller $ flera EX man Wikipedia kan får COD hon På blev & började ett Externa allt Hjälp till nya , details AB LÄS 1 inte nu, Sverige, [+] också varit SEZ Nu USA 17 BMW inom details SR "The september LULEÅ. kl. länkar februari få bl.a. vilket Compaq dock Om kommer MB Sport bilder IDG större till: som » november av och över denna kl. började SCANPIX för & blir Köp När plats än Hej! to bli Det gör var om en 75 SVT är gör kan Elizabeth 2008 9 ); första sedan genom någon länkar pris Kontakta 2007 > år details behöver med. både under bland information del (1) 8 står (€) - gör inte. the varit ger 2007 sida utan behöver länkar Sök 1. annat SEK Den och under — eller MER också Men hela 26 Pris: Jag bra. i gjorde upp så du text april Hej, 30 mot rätt från mycket mig. (CEST) sidan LG NYHETER första Hitta blev fick 4 också sitt som Stockholms något oktober Stockholm tycker 2006 6 inte, januari The Och Här mycket *** februari 40 När mellan här! mycket kl. på i säger eftersom ;) du

Re: [tesseract-ocr] How should Vs2015 solve this problem ?

2016-12-28 Thread ShreeDevi Kumar

See https://github.com/tesseract-ocr/tesseract/wiki/Compiling for windows
compiling instructions.

If you have cloned the repo once by git clone
https://github.com/tesseract-ocr/tesseract tesseract
you can update it using

*git pull origin*

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Dec 29, 2016 at 12:26 PM, ShreeDevi Kumar <shreesh...@gmail.com>
wrote:

> Please rebuild leptonica with the latest source from github (
> https://github.com/DanBloomberg/leptonica)
> and then rebuild tesseract with the latest source from github (
> https://github.com/tesseract-ocr/tesseract) and try.
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Thu, Dec 29, 2016 at 5:38 AM, 송민규 <song600...@gmail.com> wrote:
>
>> How should Vs2015 solve this problem ?
>>  Too many Error...
>>
>> It looks like the pixRead function found the file "a.png" in C: \ datas,
>> why does pixRead return a NULL value?
>>
>> OS : Windows 10 pro
>> IDE tool : visual studio 2015 update 3
>> Teseract - ocr version link : https ://github.com/tesseract-ocr/tesseract
>> (Tesseract Open Source OCR Engine(main repository))
>> leptonica version : 1.74.0
>>
>> #include 
>> #include 
>> #include
>>
>> int main()
>>
>> {
>> char * outText = nullptr;
>> Pix *image = nullptr;
>>
>> tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
>> // Initialize tesseract-ocr with English, without specifying tessdata path
>> if (api->Init(NULL, "eng")) {
>>  fprintf(stderr, "Could not initialize tesseract.\n");
>>  exit(1);
>> }
>>
>>
>>
>> image = pixRead("C:\\datas\\a.tif");
>> api->SetImage(image);
>> // Get OCR result
>> outText = api->GetUTF8Text();
>> printf("\n OCR output: %s \n", outText);
>>
>> // Destroy used object and release memory
>> api->End();
>> delete[] outText;
>> pixDestroy();
>>
>> return 0;
>>
>> }
>>
>> Output:
>> //
>> Error in pixReadStreamPng : function not present
>> Error in pixReadStream : no fix returned
>> Error in pixRead : pix not Read
>> Error in pixGetDimensions : pix not defined
>> Error in pixGetColormap : pix not defined
>> Error in pixGetCopy : Pixs not defined
>> Error in pixGetDepth : pix not defined
>> Error in pixGetWpl : pix not defined
>> Error in pixGetYRes : pix not defined
>> Error in pixGetClone : Pixs not defined
>>
>> Please call SetImage before attempting recognition.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/2afe5d49-c370-4d63-a54e-045ca04fcfb1%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/2afe5d49-c370-4d63-a54e-045ca04fcfb1%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduViKDk_wLQGkkbbfsJtHMZQLY0ZUbHMidEuRkpgK2B_FQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] How should Vs2015 solve this problem ?

2016-12-28 Thread ShreeDevi Kumar

Please rebuild leptonica with the latest source from github (
https://github.com/DanBloomberg/leptonica)
and then rebuild tesseract with the latest source from github (
https://github.com/tesseract-ocr/tesseract) and try.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Dec 29, 2016 at 5:38 AM, 송민규  wrote:

> How should Vs2015 solve this problem ?
>  Too many Error...
>
> It looks like the pixRead function found the file "a.png" in C: \ datas,
> why does pixRead return a NULL value?
>
> OS : Windows 10 pro
> IDE tool : visual studio 2015 update 3
> Teseract - ocr version link : https ://github.com/tesseract-ocr/tesseract
> (Tesseract Open Source OCR Engine(main repository))
> leptonica version : 1.74.0
>
> #include 
> #include 
> #include
>
> int main()
>
> {
> char * outText = nullptr;
> Pix *image = nullptr;
>
> tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
> // Initialize tesseract-ocr with English, without specifying tessdata path
> if (api->Init(NULL, "eng")) {
>   fprintf(stderr, "Could not initialize tesseract.\n");
>   exit(1);
> }
>
>
>
> image = pixRead("C:\\datas\\a.tif");
> api->SetImage(image);
> // Get OCR result
> outText = api->GetUTF8Text();
> printf("\n OCR output: %s \n", outText);
>
> // Destroy used object and release memory
> api->End();
> delete[] outText;
> pixDestroy();
>
> return 0;
>
> }
>
> Output:
> //
> Error in pixReadStreamPng : function not present
> Error in pixReadStream : no fix returned
> Error in pixRead : pix not Read
> Error in pixGetDimensions : pix not defined
> Error in pixGetColormap : pix not defined
> Error in pixGetCopy : Pixs not defined
> Error in pixGetDepth : pix not defined
> Error in pixGetWpl : pix not defined
> Error in pixGetYRes : pix not defined
> Error in pixGetClone : Pixs not defined
>
> Please call SetImage before attempting recognition.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/2afe5d49-c370-4d63-a54e-045ca04fcfb1%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWoF%2BQf0j4Ajq%3DER48GMvoR7oXoSsdZ-LPJg9%2BuZvE0uQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Again: read_params_file: parameter not found: II*

2017-01-01 Thread ShreeDevi Kumar

What about osd.traineddata and config files? Are they in your tessdata
directory?

- excuse the brevity, sent from mobile

On 01-Jan-2017 9:22 PM,  wrote:

> Hi all,
>
> I'm in a time critical situation. I want to deliver a new software for our
> customer on 5th January 2017.
> While things worked well on the test-environment; after deploying the
> software on the productive environment problems came up.
> Before describing the situation/failure in detail, some info about the
> setup and the environment.
>
>
> Environment & Installation
>
> *Operating System: Suse Enterprise Linux Server 12 SP 1*
> $ uname –a
> Linux 3.12.62-60.64.8-default #1 SMP Tue Oct 18 12:21:38 UTC 2016
> (42e0a66) x86_64 x86_64 x86_64 GNU/Linux
> Since this environment is managed, I can not update any system libraries
> like glibc etc.
> *So the newest and only official supported version for "Suse 12 SP1
> x86_64" of teaaseract I found is 3.02*
>
> *Installed Packages:*
> libgif4-4.1.6-34.1.1.x86_64.rpm
> liblept3-1.69-16.1.x86_64.rpm
> libtesseract3-3.02.02-3.2.1.x86_64.rpm
> libwebp4-0.3.1-34.1.x86_64.rpm
> tesseract-3.02.02-59.1.x86_64.rpm
>
> *tesseract version*
> $ tesseract –v
> tesseract 3.02.02
> leptonica-1.69
> libgif 4.1.6 : libjpeg 8d : libpng 1.5.22 : libtiff 4.0.6 : zlib
> 1.2.8
>
> *Release details*
> $ zypper info tesseract
> Information for package tesseract:
> --
> Repository: @System
>
>
> *Name: tesseractVersion: 3.02.02-59.1Arch: x86_64*
> Vendor: obs://build.opensuse.org/home:koprok
> Support Level: unknown
> Installed: Yes
> Status: up-to-date
> Installed Size: 3.8 MiB
> Summary: Open Source OCR Engine
> Description: […]
>
>
> Traindata & Languages
>
> *Traindata*
> The traindata has been manually downloaded from github
> 
> .
>
>- https://sourceforge.net/projects/tesseract-ocr-alt/
>files/tesseract-ocr-3.02.eng.tar.gz/download
>
> 
>- https://sourceforge.net/projects/tesseract-ocr-alt/
>files/tesseract-ocr-3.02.deu.tar.gz/download
>
> 
>
> *And files have been to /usr/share/tessdata/*
> $ ls -la /usr/share/tessdata/
> drwxr-xr-x 1 root root  230 Dec 31 16:37 configs/
> -rw-r--r-- 1 root root  2438081 Dec 30 15:31 deu.traineddata
> -rw-r--r-- 1 root root   171918 Dec 30 20:16 eng.cube.bigrams
> -rw-r--r-- 1 root root   38 Dec 30 20:16 eng.cube.fold
> -rw-r--r-- 1 root root  181 Dec 30 20:16 eng.cube.lm
> -rw-r--r-- 1 root root   857304 Dec 30 20:16 eng.cube.nn
> -rw-r--r-- 1 root root  254 Dec 30 20:16 eng.cube.params
> -rw-r--r-- 1 root root 13020078 Dec 30 20:16 eng.cube.size
> -rw-r--r-- 1 root root  2444187 Dec 30 20:16 eng.cube.word-freq
> -rw-r--r-- 1 root root  996 Dec 30 20:16 eng.tesseract_cube.nn
> -rw-r--r-- 1 root root 21876572 Dec 30 20:16 eng.traineddata
> drwxr-xr-x 1 root root   88 Dec 31 16:37 tessconfigs/
>
> *tesseract detects 'deu' and 'eng' as available languages*
> $ tesseract --list-langs
> List of available languages (2):
> deu
> eng
>
>
> Application & Problem
>
> *The software application is build upon Spring Boot framework*
> Runtime.getRuntime().exec(new String[] {
>  "tesseract",
>  "--tessdata-dir", "/usr/share/tessdata",
>  "-l", lang.getISO3Language(),
>  inputTiff.toAbsolutePath().toString(), extractedcntPath });
>
> *The appication logfile says*
> 2016-12-30 20:30:02,320 [https-jsse-nio-8443-exec-7] WARN
> PDFContentExtractor - read_params_file: parameter not found: II*
>
> *Executing tesseract with tessdata dir fails*
> $ tesseract --tessdata-dir /usr/share/tessdata -l deu
> inputPdf6632237754781472255.tiff out4
> read_params_file: parameter not found: II*
>
> *When executing tesseract with no tessdata dir works well*
> $ tesseract -l deu inputPdf6632237754781472255.tiff out5
> Tesseract Open Source OCR Engine v3.02.02 with Leptonica
>
>
> Questions & Ideas
> Why does tesseract work well and detect the available languages without
> the --tessdata-dir parameter set?
> Why does teasseract crash during initialization when using the
> --tessdata-dir parameter set?
> Is there any difference between running tesseract with/without the 
> --tessdata-dir
> parameter set?
>
> What can I do to fix this problem?
> Install a newer version of tesseract?
> Compile a version from sources?
> Use other traindata/tessdata?
> Run tesseract without the --tessdata-dir param?
>
> If anybody can help me getting this issue solved in the upcomming week, it
> would not only make me happy, but rather our whole team.
>
> Thank you very much in advance!
> Rüdiger Kurz
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group

Re: [tesseract-ocr] Again: read_params_file: parameter not found: II*

2017-01-01 Thread ShreeDevi Kumar

Is TESSDATA _PREFIX variable set in the environment? If so, what is the
directory, it is pointing to?

- excuse the brevity, sent from mobile

On 01-Jan-2017 9:35 PM, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote:

> What about osd.traineddata and config files? Are they in your tessdata
> directory?
>
> - excuse the brevity, sent from mobile
>
> On 01-Jan-2017 9:22 PM, <ruediger.k...@deutschebahn.com> wrote:
>
>> Hi all,
>>
>> I'm in a time critical situation. I want to deliver a new software for
>> our customer on 5th January 2017.
>> While things worked well on the test-environment; after deploying the
>> software on the productive environment problems came up.
>> Before describing the situation/failure in detail, some info about the
>> setup and the environment.
>>
>>
>> Environment & Installation
>>
>> *Operating System: Suse Enterprise Linux Server 12 SP 1*
>> $ uname –a
>> Linux 3.12.62-60.64.8-default #1 SMP Tue Oct 18 12:21:38 UTC 2016
>> (42e0a66) x86_64 x86_64 x86_64 GNU/Linux
>> Since this environment is managed, I can not update any system libraries
>> like glibc etc.
>> *So the newest and only official supported version for "Suse 12 SP1
>> x86_64" of teaaseract I found is 3.02*
>>
>> *Installed Packages:*
>> libgif4-4.1.6-34.1.1.x86_64.rpm
>> liblept3-1.69-16.1.x86_64.rpm
>> libtesseract3-3.02.02-3.2.1.x86_64.rpm
>> libwebp4-0.3.1-34.1.x86_64.rpm
>> tesseract-3.02.02-59.1.x86_64.rpm
>>
>> *tesseract version*
>> $ tesseract –v
>> tesseract 3.02.02
>> leptonica-1.69
>> libgif 4.1.6 : libjpeg 8d : libpng 1.5.22 : libtiff 4.0.6 : zlib
>> 1.2.8
>>
>> *Release details*
>> $ zypper info tesseract
>> Information for package tesseract:
>> --
>> Repository: @System
>>
>>
>> *Name: tesseractVersion: 3.02.02-59.1Arch: x86_64*
>> Vendor: obs://build.opensuse.org/home:koprok
>> Support Level: unknown
>> Installed: Yes
>> Status: up-to-date
>> Installed Size: 3.8 MiB
>> Summary: Open Source OCR Engine
>> Description: […]
>>
>>
>> Traindata & Languages
>>
>> *Traindata*
>> The traindata has been manually downloaded from github
>> <https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-302>
>> .
>>
>>- https://sourceforge.net/projects/tesseract-ocr-alt/files/
>>tesseract-ocr-3.02.eng.tar.gz/download
>>
>> <https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-3.02.eng.tar.gz/download>
>>- https://sourceforge.net/projects/tesseract-ocr-alt/files/
>>tesseract-ocr-3.02.deu.tar.gz/download
>>
>> <https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-3.02.deu.tar.gz/download>
>>
>> *And files have been to /usr/share/tessdata/*
>> $ ls -la /usr/share/tessdata/
>> drwxr-xr-x 1 root root  230 Dec 31 16:37 configs/
>> -rw-r--r-- 1 root root  2438081 Dec 30 15:31 deu.traineddata
>> -rw-r--r-- 1 root root   171918 Dec 30 20:16 eng.cube.bigrams
>> -rw-r--r-- 1 root root   38 Dec 30 20:16 eng.cube.fold
>> -rw-r--r-- 1 root root  181 Dec 30 20:16 eng.cube.lm
>> -rw-r--r-- 1 root root   857304 Dec 30 20:16 eng.cube.nn
>> -rw-r--r-- 1 root root  254 Dec 30 20:16 eng.cube.params
>> -rw-r--r-- 1 root root 13020078 Dec 30 20:16 eng.cube.size
>> -rw-r--r-- 1 root root  2444187 Dec 30 20:16 eng.cube.word-freq
>> -rw-r--r-- 1 root root  996 Dec 30 20:16 eng.tesseract_cube.nn
>> -rw-r--r-- 1 root root 21876572 Dec 30 20:16 eng.traineddata
>> drwxr-xr-x 1 root root   88 Dec 31 16:37 tessconfigs/
>>
>> *tesseract detects 'deu' and 'eng' as available languages*
>> $ tesseract --list-langs
>> List of available languages (2):
>> deu
>> eng
>>
>>
>> Application & Problem
>>
>> *The software application is build upon Spring Boot framework*
>> Runtime.getRuntime().exec(new String[] {
>>  "tesseract",
>>  "--tessdata-dir", "/usr/share/tessdata",
>>  "-l", lang.getISO3Language(),
>>  inputTiff.toAbsolutePath().toString(), extractedcntPath });
>>
>> *The appication logfile says*
>> 2016-12-30 20:30:02,320 [https-jsse-nio-8443-exec-7] WARN
>> PDFContentExtractor - read_params_file: parameter not found: II*
>>
>> *Executing tesseract with tessdata dir fails*
>> $ tesseract --tessdata-dir /usr/share/tessdata -l deu
>> inputPdf6632237754781472255.t

Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-08 Thread ShreeDevi Kumar

Sorry, I am not familiar with powershell and nuget.

If you are on Windows, you can try the experimental binaries for 4.0.0alpha
for gimagereader, gui front-end to Tesseract-ocr. You can ocr a pdf
directly or load multiple images at the same time.

- excuse the brevity, sent from mobile

On 09-Jan-2017 12:49 PM, "Ludvig F Aarstad" <lud...@aarstad.org> wrote:

> Thanks Shree :D. Really appreciate it. Will this work with v3.03 too? I am
> basing my code on this: https://github.com/jourdant/powershell-paperless
> and there is a script to initialize the environment that is getting the
> tesseract files from here: https://nuget.org/api/v2/package/tesseract-ocr.
> Would you be able to point me in the right direction on how to move this
> from 3.03 to the 4.0alpha?
>
>
>
> fredag 6. januar 2017 13.50.38 UTC+1 skrev shree følgende:
>
>> I have uploaded modified nor.traineddata at
>>
>> https://github.com/Shreeshrii/tessdata4alpha/blob/master/nor.traineddata
>>
>> See attached log and info file for commands used in training. It took
>> about 9 hours on my pc - about 1700 iterations only and then my PC froze so
>> I rebooted and created the traineddata for norlayer0.853_1615.lstm i.e.
>> 0.853 % character error rate at iteration number 1615.
>>
>>
>> ShreeDevi
>> ____
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Fri, Jan 6, 2017 at 5:59 PM, ShreeDevi Kumar <shree...@gmail.com>
>> wrote:
>>
>>> @Peter, Have you tried the 4.0.0alpha version yet?
>>>
>>> @Ludvig F. Aarstad - Add a layer training worked for adding 'Æ' - I
>>> will upload the new traineddata so that you can test. You will need
>>> 4.0.alpha version for testing.
>>>
>>> Here is couple of the training tifs and OCRed text.
>>>
>>> ShreeDevi
>>> 
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Fri, Jan 6, 2017 at 5:01 PM, Peter <pe...@peterkrantz.se> wrote:
>>>
>>>>
>>>>
>>>> Den torsdag 5 januari 2017 kl. 04:39:01 UTC+1 skrev shree:
>>>>>
>>>>> Ray is planning to retrain the languages for the new 4.0.0 version
>>>>> sometime in January. So it would be helpful if you could open an issue on
>>>>> https://github.com/tesseract-ocr/langdata/issues with this
>>>>> information.
>>>>>
>>>>
>>>> Is it possible to contribute training data for this effort? I realise
>>>> swedish will not be on top of the list but I think it would be easy to
>>>> involve some of the research community here in contributing training data
>>>> if it could improve the language model.
>>>>
>>>> /Peter
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40goo
>>>> glegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com?utm_medium=email_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/f2ddc038-3409-44e6-8b00-2354a95d3ba6%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/f2ddc038-3409-44e6-8b00-2354a95d3ba6%40googlegroups.com?utm_medium=email_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXUj4Nf5wcpJfHPnrCt3Ds1BbVD3KcMPEUYqQdnORiPHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-09 Thread ShreeDevi Kumar

Actually postprocessing with replace for AE will be the best bet as 4.0 is
slower than the tesseract engine for latin-based scripts.

You can experiment with 4.0.0alpha.

See https://github.com/tesseract-ocr/tesseract/wiki/Compiling
you will also need to compile the latest version of leptonica before that.

Sources are at:
https://github.com/DanBloomberg/leptonica.git
https://github.com/tesseract-ocr/tesseract.git

There is no separate src directory for tesseract.

I used git clone to get the master branch and then use pull origin to
update it. You can also download zip with current master.



ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Jan 9, 2017 at 1:18 PM, Ludvig F Aarstad <lud...@aarstad.org> wrote:

> No worries, I will play around and see what I can get working. For now I
> am using a simple replace in my script to handle the Æ.
> How would I go about if I were to compile tesseract 4.0 alpha using git
> and cmake? The wiki says the 4.0 alpha Source code is available in the
> master branch of the repository, but I have yet to find it...The compiling
> part seems straght-forward enough, but I need the source ;).
>
> Tried installing the gimagereader hoping that it would give me the dll for
> tesseract 4.0, but no.
>
> mandag 9. januar 2017 08.34.18 UTC+1 skrev shree følgende:
>
>> Sorry, I am not familiar with powershell and nuget.
>>
>> If you are on Windows, you can try the experimental binaries for
>> 4.0.0alpha for gimagereader, gui front-end to Tesseract-ocr. You can ocr a
>> pdf directly or load multiple images at the same time.
>>
>> - excuse the brevity, sent from mobile
>>
>> On 09-Jan-2017 12:49 PM, "Ludvig F Aarstad" <lud...@aarstad.org> wrote:
>>
>>> Thanks Shree :D. Really appreciate it. Will this work with v3.03 too? I
>>> am basing my code on this: https://github.com/jourdant/po
>>> wershell-paperless and there is a script to initialize the environment
>>> that is getting the tesseract files from here:
>>> https://nuget.org/api/v2/package/tesseract-ocr. Would you be able to
>>> point me in the right direction on how to move this from 3.03 to the
>>> 4.0alpha?
>>>
>>>
>>>
>>> fredag 6. januar 2017 13.50.38 UTC+1 skrev shree følgende:
>>>
>>>> I have uploaded modified nor.traineddata at
>>>>
>>>> https://github.com/Shreeshrii/tessdata4alpha/blob/master/nor
>>>> .traineddata
>>>>
>>>> See attached log and info file for commands used in training. It took
>>>> about 9 hours on my pc - about 1700 iterations only and then my PC froze so
>>>> I rebooted and created the traineddata for norlayer0.853_1615.lstm i.e.
>>>> 0.853 % character error rate at iteration number 1615.
>>>>
>>>>
>>>> ShreeDevi
>>>> 
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>> On Fri, Jan 6, 2017 at 5:59 PM, ShreeDevi Kumar <shree...@gmail.com>
>>>> wrote:
>>>>
>>>>> @Peter, Have you tried the 4.0.0alpha version yet?
>>>>>
>>>>> @Ludvig F. Aarstad - Add a layer training worked for adding 'Æ' - I
>>>>> will upload the new traineddata so that you can test. You will need
>>>>> 4.0.alpha version for testing.
>>>>>
>>>>> Here is couple of the training tifs and OCRed text.
>>>>>
>>>>> ShreeDevi
>>>>> 
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>
>>>>> On Fri, Jan 6, 2017 at 5:01 PM, Peter <pe...@peterkrantz.se> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> Den torsdag 5 januari 2017 kl. 04:39:01 UTC+1 skrev shree:
>>>>>>>
>>>>>>> Ray is planning to retrain the languages for the new 4.0.0 version
>>>>>>> sometime in January. So it would be helpful if you could open an issue 
>>>>>>> on
>>>>>>> https://github.com/tesseract-ocr/langdata/issues with this
>>>>>>> information.
>>>>>>>
>>>>>>
>>>>>> Is it possible to contribute training data for this effort? I realise
>>>>>> swedish will not be on top of the list but I think it would be easy to
>>>>>> in

Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-05 Thread ShreeDevi Kumar

Tried 'Finetune' - that does not help in addition of a character.

Trying 'Add a layer' now.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Jan 5, 2017 at 8:59 PM, Ludvig F Aarstad  wrote:

> Fantastic, thanks:).
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/3d46bbdd-bfe4-46be-8bdb-aff48e3f00f1%
> 40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWj35P6rhR83u1mnhKkBE1KBf228zGKG234Ukx%3DqaCQrg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Swedish language

2017-01-06 Thread ShreeDevi Kumar

Peter,

Please see
https://github.com/tesseract-ocr/langdata/blob/master/swe/swe.training_text

You can provide additional training text if some needed characters are
missing in the above. I can do a test training with it.

- excuse the brevity, sent from mobile

On 06-Jan-2017 5:01 PM, "Peter"  wrote:

>
>
> Den torsdag 5 januari 2017 kl. 04:39:01 UTC+1 skrev shree:
>>
>> Ray is planning to retrain the languages for the new 4.0.0 version
>> sometime in January. So it would be helpful if you could open an issue on
>> https://github.com/tesseract-ocr/langdata/issues with this information.
>>
>
> Is it possible to contribute training data for this effort? I realise
> swedish will not be on top of the list but I think it would be easy to
> involve some of the research community here in contributing training data
> if it could improve the language model.
>
> /Peter
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVeTsBh0_j7dYCZQFmHp%3D4hJ_3iEju9aj%3DUoqn9%2B1v3xA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] unpack [lang].traineddata

2016-12-19 Thread ShreeDevi Kumar

combine_tessdata -u ara.traineddata ara.

On 19-Dec-2016 1:57 PM, "universal reseller"  wrote:

> this is not a zip file..
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/CAC9ebroxmtD4i-MnJTfM8ACHtc%3DcdfRcNiF7eBTMyKBK6f-n4w%
> 40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUFbr1VYs-jTVr%3DD0zYf%2Bf9LM26x8Up6VJCZXvaRAmkaA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: tesseract installed but cannot be found in cmd console (win7)

2016-12-21 Thread ShreeDevi Kumar

You also need to add the location of tesseract binaries to PATH.

- sent from mobile phone

On 22-Dec-2016 9:50 AM, "Junmock Lee"  wrote:

> How To Add/Edit Environment Variables in Windows 7
> https://www.nextofwindows.com/how-to-addedit-environment-
> variables-in-windows-7
>
> You also have to set
> Variable: TESSDATA_PREFIX
> Value: (Your tessdata path)
>
> On Monday, December 19, 2016 at 2:38:42 PM UTC+9, Randy Welt wrote:
>>
>> Hi I just installed tesseract from a prebuilt binary package on win7:
>> I choose Tesseract at UB mannheim vers 4.00
>> https://github.com/tesseract-ocr/tesseract/wiki
>>
>> Installation is fine, path as follows:
>> C:\Program Files (x86)\Tesseract-OCR
>>
>> However when I enter tesseract --version or tesseract.exe -v  in  the cmd
>> prompt,
>> it says command not found.
>>
>> do I have to set a PATH environment variable?
>> Please help me. I have no idea how to do this in Windows.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/7a93360e-089b-466e-a52d-1cb45cd293b0%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUiYa4TBisJb4yeR1Meb9yoE%3D4-PaDsOD9anMVmPp_mLA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Can't run tesseract with LSTM

2017-03-23 Thread ShreeDevi Kumar

There might be some problem with your input file - all the following work
for me.
Please note that whitelist has no effect in 4.0

$ tesseract input.tif input
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
Page 1
$ tesseract input.tif input --psm 7
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
Page 1
$ tesseract input.tif input --psm 7 -c tessedit_char_whitelist=0123456789
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
Page 1
$ tesseract input.tif input --psm 7 -c tessedit_char_whitelist=0123456789
--oem 2
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
Page 1
$ tesseract input.tif input --psm 7 -c tessedit_char_whitelist=0123456789
--oem 2 --tessdata-dir ./tessdata
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
Page 1
$

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Mar 23, 2017 at 12:26 AM, Jenkar Smithy 
wrote:

> When trying the following command :
> tesseract input.tiff ./result --psm 7 -c tessedit_char_whitelist=012345
> 6789  --oem 2 --tessdata-dir ~/
>
> I get the following error :
> Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
> Page 1
> int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 180
> int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 180
>
> The data is the correct one (for 4.0), from the git.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/05c29148-7d90-4690-82d8-74077f1003ec%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXD-fc-_HAO39bQbxRxXs2Rt-0vO8FU0VGz8ar9arjy8Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
08746354312

Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread ShreeDevi Kumar

The initial 4.0alpha tag from November has cube in it. It was deleted later
and is no longer in master.

In fact, the OEM code for LSTM was originally 4 and now is 2.

Shouldn't semantic versioning require tagging at major updates?

- excuse the brevity, sent from mobile

On 22-Mar-2017 8:58 PM, "universal reseller"  wrote:

> how did you used cube engine on tesse 4 !?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/CAC9ebrorORNrpApquscKiPf2Qbguc
> 85qY1SJve6heu2j4Dithg%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVLfmtk%3DgmuX1DKD%3DYDbvtcF8xFC%2B5i0GyCP2Eiqj5v9Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread ShreeDevi Kumar

See
https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance



- excuse the brevity, sent from mobile

On 22-Mar-2017 8:58 PM, "universal reseller"  wrote:

> how did you used cube engine on tesse 4 !?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/CAC9ebrorORNrpApquscKiPf2Qbguc
> 85qY1SJve6heu2j4Dithg%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVSVfcic7MkV9xsSC6cmk%2BTfLrLu%2BduESxemYVGpsOyYQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread ShreeDevi Kumar

Sorry, mentioned incorrect code for LSTM

OCR Engine modes:
  0Original Tesseract only.
  1Neural nets LSTM only.
  2Tesseract + LSTM.
  3Default, based on what is available


- excuse the brevity, sent from mobile

On 22-Mar-2017 9:02 PM, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote:

> The initial 4.0alpha tag from November has cube in it. It was deleted
> later and is no longer in master.
>
> In fact, the OEM code for LSTM was originally 4 and now is 2.
>
> Shouldn't semantic versioning require tagging at major updates?
>
> - excuse the brevity, sent from mobile
>
> On 22-Mar-2017 8:58 PM, "universal reseller" <uniresel...@gmail.com>
> wrote:
>
>> how did you used cube engine on tesse 4 !?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/CAC9ebrorORNrpApquscKiPf2Qbguc85qY1SJve6he
>> u2j4Dithg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAC9ebrorORNrpApquscKiPf2Qbguc85qY1SJve6heu2j4Dithg%40mail.gmail.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXrR26F5aXCU6UQdW2g2-bsEQW0Sb2-yPn-yA85tEP9PQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Having issue with Italic characters

2017-03-24 Thread ShreeDevi Kumar

Use Tesseract 4.0.0alpha and --oem 1 for LSTM. It works ok with that.
--oem 0 with legacy engine gives / instead of i.

you could test to see if a  better dpi image(300 dpi)  works with the
legacy engine.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Mar 24, 2017 at 8:01 AM, Muhammad Shamim 
wrote:

> Hi,
>
> I am using  tesseract-ocr-setup-3.05.00dev.exe
> 
> to do OCR and its working fine for me with default training data files .
> Only facing issue with Italic character .
> e.g
>  Italic "l"   => "/"
>  Italic "i"   => "/"
> Anybody has idea to deal with this issue ?
> Any extra step need to do ?
>
> Thankyou
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/0a801b3c-9dfd-48b0-ab81-af2d71e2ed91%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWV_Qk5rohX8uGqbOvE9wcL2PVH9juO8ExHvV3DhurGBw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: How to download the Tesseract trained data for Digital display numbers ( Seven Segments Data trained data )

2017-03-27 Thread ShreeDevi Kumar

https://github.com/tesseract-ocr/tesseract/wiki/AddOns

has link to traineddata for digital seven fonts.

https://github.com/arturaugusto/display_ocr

You can download various digital seven fonts, create traineing data images
and train - all in Jtessboxeditor. Use 3.0x version

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Mar 27, 2017 at 1:50 PM, komal gawade 
wrote:

>
>
> On Friday, March 24, 2017 at 3:27:39 PM UTC+5:30, komal gawade wrote:
>>
>>
>> Hello,
>> I am basically working in electronics field and new to C#.Currently I am
>> working on one project (Image processing in C#) where i am using C#,where
>> in one of the part i have to detect text or digits of 7 segment display
>> image for that on google i found Tesseract  solution.
>>
>> For experiment i have first try to convert normal text image in to text
>> file and it is working fine for some of the basic images but it is not
>> working with 7 segment display.so i came to know i required trained data
>> file for 7 segment.
>>
>> For training 7 segment data i follow the steps which are shown in vidoe
>> of below link:https://www.youtube.com/watch?v=i_1-hGsXxy8.
>> But the output.txt file showing in that video is not generating in my
>> case.Due to which after using trained 7 segment data file ,i am getting
>> garbage value in text file.So for checking that i am getting proper trained
>> file or not , i have follow the procedure which is shown on that video but
>> it is giving an error  like outpt.txt file not found.Is this happened
>> because of missing otput.txt file or something else i am missing to do.I
>> have follow all the steps which are shown in that video for training 7
>> segment data.
>>
>> Also i have installed jTessBoxEditorFX.jar, serak trainer & Tesseract-ocr
>> v3.02.So at the end i am just stuck in the point where i don't know
>> where i am going wrong,is my procedure is wrong or software installation is
>> not proper because after installing tesseract there is red cross mark
>> against tesseract.
>>
>> Please somebody help me to figure it out.If possible please provide me 7
>> segment trained data file and also the exact steps to trained 7 segment
>> data as i have to trained some more files for various display icons and
>> some specific messages.Its very urgent as my project is stuck and i am
>> helpless because after trying so much solutions in image processing for 7
>> segment display detection like pixel count & image comparison in C#, i came
>> up on tesseract solution.
>> If you have any doubts on understanding  my query please let me know.
>>
>> Please do the needful.
>>
>>
>>>
>>>
>>>
>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/814e5125-8224-4a38-9035-5ab1c3bc0488%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW7pCzF245W-0630CEDbrts3KZjkWFjefwM7_rnn3A8XQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Low Accurate ini bold font

2017-03-27 Thread ShreeDevi Kumar

Try latest version of tesseract - build from master. Use --psm 7 --oem 1

I get correct result for both.

tesseract unnamed1.png unnamed1 --psm 7 --oem 1

Tesseract Open Source OCR Engine v4.00.00alpha-347-g60c8b12 with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Mar 27, 2017 at 1:15 PM, afrizal firdaus 
wrote:

> Hello guys. I am trying to ocr the picture that has text using meme's
> font. But i always get bad result for this font.
>
>
>  result :
>
>
> *PlEﬂSE*result :
>
> *TIEllMIE MUHE*Can anyone help me please?
>
> *Thank you*
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/1f9c0d7a-b1aa-4fbf-85d9-93f227379791%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW8T%2Bsdkwra2w1%2BU_w3DzvcHQstBumzCO7ju3CcP3YK9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
TELL ME MORE

PLEASE

Re: [tesseract-ocr] Can't run tesseract with LSTM

2017-03-23 Thread ShreeDevi Kumar

what version of tesseract are you running? If you built it, which commit
source have you used?

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Mar 23, 2017 at 4:28 PM, Jenkar Smithy 
wrote:

> Tried with your input file - still no dice, throws the same error.
>
> Interesting about the charwhitelist, thanks for pointing that out!
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/37da5cbc-ee2e-449d-ac6b-ea3525bf5c48%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUS2RZk9dy-TdM7iBnjtqdkDSojqpY8Pto8Thet%3Dh5LQw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Can't run tesseract with LSTM

2017-03-23 Thread ShreeDevi Kumar

Ok. I am using an older version ...

git log -1


commit 0ff26ee3de166659970d80e50aef4000ff2557b2
Author: zdenop 
Date:   Fri Feb 3 08:15:15 2017 +0100

Merge pull request #698 from stweil/configure

configure: Run AVX test only with 64 bit compiler

Please try with that. If that works, some newer commit might be a problem.


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Mar 23, 2017 at 6:05 PM, Jenkar Smithy 
wrote:

> Using tesseract 4.00.00alpha , built from commit
> 2b52915a740a39944157fd0fda0524fd1d71ef83
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/27eed73f-6883-4a64-8d4e-1399a2c50fed%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWuaV3wCfU0_HzQs31UE7%3DY2WJM-o9ZU2iMQ1OKyGK9wQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] How to create a PDF ?

2017-03-23 Thread ShreeDevi Kumar

see https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage

also check that u have pdf.ttf in your tessdata folder

https://github.com/tesseract-ocr/tesseract/tree/master/tessdata

tesseract  --tessdata-dir ./ ./testing/eurotext.png
./testing/eurotext-eng -l eng pdf


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Mar 23, 2017 at 6:44 PM, Saliaj Adrian  wrote:

> Hello,
>
> How can I can create a PDF with Tesseract ?
>
> When I do : tesseract -l fra test.tif out2 -c tessedit_create_hocr=1
> It works very good and my HTML file is created.
>
> But when I do : tesseract -l fra test.tif out2 -c tessedit_create_pdf=1
> It says : "Can not open file "usr/share/local/tessdata//pdf.ttf"!
> Error during processing."
>
> Why ?
>
> Thanks in advance,
> Adrian
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/d8bf5ba0-8528-4518-9a22-3a5c4ac4937f%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWBZ3GzAAFZ2GbOAO0%3DuCeDnyKs%3DLThNke%2BV3uD4ydmng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] How to create a PDF ?

2017-03-23 Thread ShreeDevi Kumar

in https://github.com/tesseract-ocr/tesseract/tree/master/tessdata

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Mar 23, 2017 at 7:04 PM, Saliaj Adrian  wrote:

> No I don't have pdf.ttf in my tessdata folder...
>
> Where can I find it ?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/4927503f-c032-46ab-91b1-3bc804437c7e%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWOTiHs0XZoAA_8LYW49c64RY3YMZQUsQdm2ivf%3DKhoyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread ShreeDevi Kumar

See
https://github.com/tesseract-ocr/tesseract/commit/5deebe6c279f70215935c1f86baa7e7016c7f2a7

Ray's comment for commit

Moved cube aside without deleting it.



- excuse the brevity, sent from mobile

On 22-Mar-2017 10:07 PM, "THintz" <tdhi...@gmail.com> wrote:

> I'm sure I cloned master on 3/20/2017 3:55.   publictypes.h defines this:
>
> enum OcrEngineMode {
>   OEM_TESSERACT_ONLY,   // Run Tesseract only - fastest
>   OEM_LSTM_ONLY,// Run just the LSTM line recognizer.
>   OEM_TESSERACT_LSTM_COMBINED,  // Run the LSTM recognizer, but allow
> fallback
> // to Tesseract when things get difficult.
>   OEM_DEFAULT,  // Specify this mode when calling init_*(),
> // to indicate that any of the above modes
> // should be automatically inferred from
> the
> // variables in the language-specific
> config,
> // command-line configs, or if not
> specified
> // in any of the above should be set to the
> // default OEM_TESSERACT_ONLY.
>   OEM_CUBE_ONLY,// Run Cube only - better accuracy, but
> slower
>   OEM_TESSERACT_CUBE_COMBINED,  // Run both and combine results - best
> accuracy
> };
>
>
>
> On Wednesday, March 22, 2017 at 12:04:24 PM UTC-4, shree wrote:
>>
>> Sorry, mentioned incorrect code for LSTM
>>
>> OCR Engine modes:
>>   0Original Tesseract only.
>>   1Neural nets LSTM only.
>>   2Tesseract + LSTM.
>>   3Default, based on what is available
>>
>>
>> - excuse the brevity, sent from mobile
>>
>> On 22-Mar-2017 9:02 PM, "ShreeDevi Kumar" <shree...@gmail.com> wrote:
>>
>>> The initial 4.0alpha tag from November has cube in it. It was deleted
>>> later and is no longer in master.
>>>
>>> In fact, the OEM code for LSTM was originally 4 and now is 2.
>>>
>>> Shouldn't semantic versioning require tagging at major updates?
>>>
>>> - excuse the brevity, sent from mobile
>>>
>>> On 22-Mar-2017 8:58 PM, "universal reseller" <unire...@gmail.com> wrote:
>>>
>>>> how did you used cube engine on tesse 4 !?
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/tesseract-ocr/CAC9ebrorORNrpApquscKiPf2Qbguc85qY1SJve6he
>>>> u2j4Dithg%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAC9ebrorORNrpApquscKiPf2Qbguc85qY1SJve6heu2j4Dithg%40mail.gmail.com?utm_medium=email_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/0a0bb2e9-cc85-464c-8801-c4614edbfd05%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/0a0bb2e9-cc85-464c-8801-c4614edbfd05%40googlegroups.com?utm_medium=email_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV-4W8Eh7Z%3D87ZkV8MZikqPzhsZtbO86u7ZR7Toi7cFUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: seven segment display - 4.0 traineddata

2017-03-29 Thread ShreeDevi Kumar

FYI - this was trained using eng.traineddata and finetuned with 7segment
fonts.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Mar 29, 2017 at 9:09 PM, ShreeDevi Kumar <shreesh...@gmail.com>
wrote:

> Hi,
>
> I have built a 4.0 traineddata using some seven segment display fonts.
> Trained mostly on numbers 0-9, capital letters A-Z, : etc.
>
> It is uploaded as a zip file at https://github.com/
> Shreeshrii/tessdata4alpha/raw/master/ssd1.zip
>
> unzip to get ssd1.traineddata
>
> I have not tested it much. Seemed to work with the sample images provided
> in this email.
>
> Since B and 8, O and 0, S, Z and 5 all look similar in this display, there
> would be errors.
>
> Should this be trained just for numbers, may have a better accuracy then.
>
> I am doing another run of training to see if there are improvements.
>
> Meanwhile, those who actually want this should give it a try and provide
> feedback.
>
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Fri, Mar 24, 2017 at 3:04 PM, <komalagaw...@gmail.com> wrote:
>
>>
>> Hello,
>> I am basically working in electronics field and new to C#.Currently I am
>> working on one project (Image processing in C#) where i am using C#,where
>> in one of the part i have to detect text or digits of 7 segment display
>> image for that on google i found Tesseract  solution.
>>
>> For experiment i have first try to convert normal text image in to text
>> file and it is working fine for some of the basic images but it is not
>> working with 7 segment display.so i came to know i required trained data
>> file for 7 segment.
>>
>> For training 7 segment data i follow the steps which are shown in vidoe
>> of below link:https://www.youtube.com/watch?v=i_1-hGsXxy8.
>> But the output.txt file showing in that video is not generating in my
>> case.Due to which after using trained 7 segment data file ,i am getting
>> garbage value in text file.So for checking that i am getting proper trained
>> file or not , i have follow the procedure which is shown on that video but
>> it is giving an error  like outpt.txt file not found.Is this happened
>> because of missing otput.txt file or something else i am missing to do.I
>> have follow all the steps which are shown in that video for training 7
>> segment data.
>>
>> Also i have installed jTessBoxEditorFX.jar, serak trainer & Tesseract-ocr
>> v3.02.So at the end i am just stuck in the point where i don't know
>> where i am going wrong,is my procedure is wrong or software installation is
>> not proper because after installing tesseract there is red cross mark
>> against tesseract.
>>
>> Please somebody help me to figure it out.If possible please provide me 7
>> segment trained data file and also the exact steps to trained 7 segment
>> data as i have to trained some more files for various display icons and
>> some specific messages.Its very urgent as my project is stuck and i am
>> helpless because after trying so much solutions in image processing for 7
>> segment display detection like pixel count & image comparison in C#, i came
>> up on tesseract solution.
>> If you have any doubts on understanding  my query please let me know.
>>
>> Please do the needful.
>>
>>
>>>
>>>
>>>
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/b7fc9a05-8d8d-4e68-ac02-2e71b0078557%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/b7fc9a05-8d8d-4e68-ac02-2e71b0078557%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU9wd%2BwPwr5%3Dg-uAySsO7zN79xgUjkMdLZbzA_sxCfD5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] seven segment display - 4.0 traineddata

2017-03-29 Thread ShreeDevi Kumar

Hi,

I have built a 4.0 traineddata using some seven segment display fonts.
Trained mostly on numbers 0-9, capital letters A-Z, : etc.

It is uploaded as a zip file at
https://github.com/Shreeshrii/tessdata4alpha/raw/master/ssd1.zip

unzip to get ssd1.traineddata

I have not tested it much. Seemed to work with the sample images provided
in this email.

Since B and 8, O and 0, S, Z and 5 all look similar in this display, there
would be errors.

Should this be trained just for numbers, may have a better accuracy then.

I am doing another run of training to see if there are improvements.

Meanwhile, those who actually want this should give it a try and provide
feedback.


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Mar 24, 2017 at 3:04 PM,  wrote:

>
> Hello,
> I am basically working in electronics field and new to C#.Currently I am
> working on one project (Image processing in C#) where i am using C#,where
> in one of the part i have to detect text or digits of 7 segment display
> image for that on google i found Tesseract  solution.
>
> For experiment i have first try to convert normal text image in to text
> file and it is working fine for some of the basic images but it is not
> working with 7 segment display.so i came to know i required trained data
> file for 7 segment.
>
> For training 7 segment data i follow the steps which are shown in vidoe of
> below link:https://www.youtube.com/watch?v=i_1-hGsXxy8.
> But the output.txt file showing in that video is not generating in my
> case.Due to which after using trained 7 segment data file ,i am getting
> garbage value in text file.So for checking that i am getting proper trained
> file or not , i have follow the procedure which is shown on that video but
> it is giving an error  like outpt.txt file not found.Is this happened
> because of missing otput.txt file or something else i am missing to do.I
> have follow all the steps which are shown in that video for training 7
> segment data.
>
> Also i have installed jTessBoxEditorFX.jar, serak trainer & Tesseract-ocr
> v3.02.So at the end i am just stuck in the point where i don't know where
> i am going wrong,is my procedure is wrong or software installation is not
> proper because after installing tesseract there is red cross mark against
> tesseract.
>
> Please somebody help me to figure it out.If possible please provide me 7
> segment trained data file and also the exact steps to trained 7 segment
> data as i have to trained some more files for various display icons and
> some specific messages.Its very urgent as my project is stuck and i am
> helpless because after trying so much solutions in image processing for 7
> segment display detection like pixel count & image comparison in C#, i came
> up on tesseract solution.
> If you have any doubts on understanding  my query please let me know.
>
> Please do the needful.
>
>
>>
>>
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/b7fc9a05-8d8d-4e68-ac02-2e71b0078557%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW%3DvKOAjz-uH8aAJT-1PmcRYkZtpXzWh-4kwyHuW01Y5A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Invalid resolution 0 dpi. Using 70 instead.

2017-03-29 Thread ShreeDevi Kumar

The problem is with the input image. It does not have correct information
about dpi.

Please preprocess image to 300 dpi for better output.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Mar 29, 2017 at 8:40 AM,  wrote:

> I have also encountered this problem, have you solved this problem?Thank
> you!
>
> 在 2017年1月5日星期四 UTC+8下午3:38:56，zdenop写道：
>>
>> Warning. Invalid resolution 0 dpi.
>>
>>
>> this means that you input image has no information about DPI. This is
>> problem of image generator.
>> For bad accuracy please read relevant wiki.
>>
>> Zdenko
>>
>> On Thu, Jan 5, 2017 at 8:10 AM,  wrote:
>>
>>> I met a problem of Invalid resolution like
>>>
>>> bogon:OCR zhiyao$ tesseract /Users/zhiyao/Desktop/OCR/kindle3.jpeg
>>> ~/Desktop/OCR/kindle3
>>>
>>> Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
>>>
>>> Warning. Invalid resolution 0 dpi. Using 70 instead.
>>>
>>> I changed different input images and file types, but the same problem
>>> happened.
>>> I suppoesd the bad accuracy is due to the lower default  resolution(70)
>>> The question is why my actual resolution cannot be caught.
>>>
>>> BTW,I used tesseract 4.00 on Mac.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/b416bb94-b467-45be-ba5d-a925c9e6bf67%40goo
>>> glegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/f8866f37-ea01-4cdf-833c-7fb54cc2fd20%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUZ1u%2B2X_FC1PfXKi2%3Dg%2BT_m6OpY_TwodoC%2Bf%2B8qQ9%2BGQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: tesseract4 x64 Windows dlls?

2017-03-25 Thread ShreeDevi Kumar

Added link in wiki -
https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM

@THintz, please fix your readme file,

>cd \petri mkdir Win64 cd Win64 git clone
https://github.com/tesseract-ocr/tesseract tesseract cd tesseract cppan (I
assume this wasn't necessary, but I'm trying to avoid improvising) mkdir
Win64 && cd Win64 cppan .. cmake .. -G "Visual Studio 15 2017 Win64"

needs to be split across lines.

Thanks!

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sun, Mar 26, 2017 at 2:12 AM, THintz  wrote:

> Win64 DLLs posted here:
>
> https://github.com/tdhintz/tesseract4win64
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/786a2d0b-2174-4029-9ca6-b4016370fe87%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUJjVy1mfTBUF7SrG%3DdVRLH29f5DLo3hoKMZQLppgH-EQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: tesseract4 x64 Windows dlls?

2017-03-16 Thread ShreeDevi Kumar

Egor (cc:ed) can provide guidance regarding cppan and cmake.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Mar 16, 2017 at 6:30 PM, THintz  wrote:

> I spoke too soon.  Apparently I touched the CMake text and that caused the
> next build to recreate the solution as Win32.  I think I'm on the right
> track anyway.
>
> For each project under Solution Explorer in Visual Studio right click and
> select Properties.  Then make x64 match Win32 Platform's Output Directory
> and Intermediate Directory under the Configuration Properties | General
> tab.  Under Configuration Properties | Linker | All Options scroll up to
> Additional Options and remove /MACHINE:X86.  For some projects this is
> found under Librarian instead of Linker.
>
> This gets the build as far as linking.  The result is a lot of undefined
> references.  I presume this occurs because dependent libraries are x86, but
> I don't know.  Apparently cppan places the dependencies in Build\Release
> but chooses x86 (because there aren't x64 versions??).  I'm ignorant of the
> cmake/cppan/git eco system.
>
> Anyone that can give direction I'd welcome your input.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/fecaf88b-336f-4627-9d71-6c4d6fa12046%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXepJj_Mkkrb2q79Um%3DJXtpq_MKnaj7KSOg-mRWmhQ1CA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] tesseract multiply .png files to singular .txt file

2017-03-16 Thread ShreeDevi Kumar

Gui front-end for tesseract such as Vietocr and gimagereader will also
allow for batch processing of multiple files.

- excuse the brevity, sent from mobile

On 16-Mar-2017 9:13 PM, "Lako"  wrote:

> Hi,
>
> Apologies for the beginner question, unfortunately I am fairly new to
> Tesseract, and also coding. I have a fairly huge amount of .png files (one
> line, upperCase code) and would preferably want to create a singular text
> file where they are seperated with a semicolon, or even a space to get the
> entire list.
>
> I have successfully managed to convert a single .png to .txt but can not
> get a bulk to work. I have also looked in other posts, but often I guess
> it's explained above my degree of understanding.
>
> Many thanks for any help!
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/7c05d9c3-9da9-435b-8d21-40892a58034b%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXmH0zODGgvrqSKbm-q8OVsEYDe-uifpEOGSeM-UdVJdg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] tesseract multiply .png files to singular .txt file

2017-03-16 Thread ShreeDevi Kumar

Please inform what environment you are running in, Linux, windows, etc.

Basically, you need to to setup a loop which will process all .PNG files
and concatenate the OCR results.

- excuse the brevity, sent from mobile

On 16-Mar-2017 9:13 PM, "Lako"  wrote:

> Hi,
>
> Apologies for the beginner question, unfortunately I am fairly new to
> Tesseract, and also coding. I have a fairly huge amount of .png files (one
> line, upperCase code) and would preferably want to create a singular text
> file where they are seperated with a semicolon, or even a space to get the
> entire list.
>
> I have successfully managed to convert a single .png to .txt but can not
> get a bulk to work. I have also looked in other posts, but often I guess
> it's explained above my degree of understanding.
>
> Many thanks for any help!
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/7c05d9c3-9da9-435b-8d21-40892a58034b%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX5fGZjye358hzLUMRXQk5oYHFaY4cF%3Dv_LOSEogTVnGg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] First time user

2017-03-20 Thread ShreeDevi Kumar

https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage

On windows

Tesseract.exe loc.tif loc

Make sure tesseract.exe binary is in PATH and that tessdata_prefix variable
points to where u have the traineddata files.

- excuse the brevity, sent from mobile

On 20-Mar-2017 11:22 AM, "Michael C" wrote:

what do i type into my command prompt on windows 10 to get it to read
loc.tif single line of text (always!) and always english?

thanks!

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/
msgid/tesseract-ocr/da85d11c-cddb-493f-ac6a-7a3e927f784f%40googlegroups.com

.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV3_gcDwictEXtTHJPmhpoiDGnrGm6ohQC224cERSg30w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: New beginner

2017-03-21 Thread ShreeDevi Kumar

Make sure your input file phototest.tiff is in C:\Program
Files\Tesseract-OCR

Otherwise give full path to file.

Main error is

 image file not found

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Mar 21, 2017 at 2:56 AM, Fitriani Arifin  wrote:

> same issue here it only get error like this
> C:\Program Files\Tesseract-OCR>tesseract.exe phototest.tiff out
>  Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
>  Error in fopenReadStream: file not found
>  Error in findFileFormat: image file not found
> Error during processing.
>
>  ObjectCache(5DFAAAC8)::~ObjectCache(): WARNING! LEAK! object 01EC1CB8
> still has count 1 (id C:\Program 
> Files\Tesseract-OCR\tessdata/eng.traineddatapunc-dawg)
> ObjectCache(5DFAAAC8)::~ObjectCache(): WARNING! LEAK! object 01EC3DF0
> still has count 1 (id C:\Program 
> Files\Tesseract-OCR\tessdata/eng.traineddataword-dawg)
> ObjectCache(5DFAAAC8)::~ObjectCache(): WARNING! LEAK! object 01EC3E98
> still has count 1 (id C:\Program 
> Files\Tesseract-OCR\tessdata/eng.traineddatanumber-dawg)
> ObjectCache(5DFAAAC8)::~ObjectCache(): WARNING! LEAK! object 01ECA758
> still has count 1 (id C:\Program 
> Files\Tesseract-OCR\tessdata/eng.traineddatabigram-dawg)
> ObjectCache(5DFAAAC8)::~ObjectCache(): WARNING! LEAK! object 01ECA700
> still has count 1 (id C:\Program 
> Files\Tesseract-OCR\tessdata/eng.traineddatafreq-dawg)
>
> i'm just user. i use tesseract 3.05 compile by UB.MAINHEM
> any help please!!
> thanks
> On Monday, March 20, 2017 at 1:52:00 PM UTC+8, Wahyuni Karim wrote:
>>
>> I'd tesseract 3.02 and run it on cmd...but it only give error result...
>> Can't find input file...
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/f7d36667-81cd-43a1-95ed-7a557365598a%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUvUPq%3D2EoMcu9JPYmrf18A1yS3FjmFkOHEx_52oXybyQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: tesseract4 x64 Windows dlls?

2017-03-15 Thread ShreeDevi Kumar

Thanks for sharing how you made the x64 solution for Visual Studio.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Mar 15, 2017 at 9:44 PM, THintz  wrote:

> I follow the github instructions tesseract-ocr/tesseract/wiki/Compiling,
> then opened the resulting .sln file's Configuration Manager... and created
> an x64 clone of Win32 platform.  Then opened every project an copied the
> Win32 platform settings to x64 platform while leaving the explicate x64
> paths alone.  This compiled on Visual Studio 2017.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/ffb9a0b6-badc-4465-9fe0-448b65c9ebf6%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVM6kKHD95BhSOWM_HM6zjfsFQF9Wx4Q4ha6OWckAenpw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-16 Thread ShreeDevi Kumar

You did not mention from where you installed leptonica and tesseract.

what info do you see when you type

tesseract -v


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Mar 16, 2017 at 2:21 PM, Kazi Moinul Hossain 
wrote:

> Should i reinstall leptonica & tesseract ?
>
> On Wednesday, 15 March 2017 23:59:19 UTC+6, zdenop wrote:
>>
>> It seems that your (leptonica?) installation is corrupted. Your example
>> works for me (for 4.00 and 3.05):
>>
>> zdeno@level2:~/test> g++ sample.cpp -o sample -llept -ltesseract
>> zdeno@level2:~/test> ./sample
>> Tesseract-ocr version: 4.00.00alpha
>> Leptonica version: leptonica-1.74
>>
>>
>> Zdenko
>>
>> On Wed, Mar 15, 2017 at 4:28 PM, Kazi Moinul Hossain 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I have installed leptonica 1.74.1 and then tesseract 4.00.00.
>>> After that, i have created a C++ program stated below which ultimately
>>> will show the leptonica and tesseract version installed in the system.
>>>
>>>
>>>
>>> *#include
>>> #include#include
>>> *
>>>
>>> *int main() {*
>>>
>>>
>>>
>>>
>>>
>>> *tesseract::TessBaseAPI *myOCR = new
>>> tesseract::TessBaseAPI();printf("Tesseract-ocr version:
>>> %s\n",myOCR->Version());printf("Leptonica version:
>>> %s\n",getLeptonicaVersion());return 0;}*
>>>
>>> While compiling my code using following command
>>>
>>>
>>> *$ g++ sample.cpp -o
>>> sample -I/usr/local/include/leptonica -I/usr/local/include/tesseract -llept
>>> -ltesseract*
>>>
>>> i am encountering following error,
>>>
>>> *//usr/local/lib/libtesseract.so: undeﬁned reference to
>>> ‘pixReadFromMultipageTiff ‘*
>>> *//usr/local/lib/libtesseract.so: undeﬁned reference to
>>> ‘pixReadMemFromMultipageTiff ‘*
>>> *collect2: error: Id returned 1 exit status*
>>>
>>> Again to mention, my tesseract version is 4.00.00 and leptonica version
>>> is 1.74.1.
>>> Can anyone please help me to figure it out?
>>>
>>> Thanks,
>>> Kazi Moinul Hossain
>>> Junior Software Developer
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/be3b1def-f666-446a-ad16-211b4e3a8523%40goo
>>> glegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/3ffe6af9-1b4b-4ee2-b988-fc4dd9aaf446%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWfMtbZ1iQDCi0sFwGRQrZQ_4ic4FRD8URjwhMYLa0BAQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-16 Thread ShreeDevi Kumar

Please see https://github.com/tesseract-ocr/tesseract/issues/233



ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Mar 16, 2017 at 2:41 PM, Kazi Moinul Hossain 
wrote:

> Tesseract installation source- https://github.com/tesseract-
> ocr/tesseract.git
> Leptonica installation souce- https://github.com/
> DanBloomberg/leptonica.git
>
> After typing tesseract -v, following information is being shown:
>
>
>
>
>
>
> *tesseract 4.00.00alpha-337-g7c27088 leptonica-1.74.1  libjpeg 8d
> (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 Found
> AVX Found SSE*
> On Thursday, 16 March 2017 04:56:52 UTC-4, shree wrote:
>>
>> You did not mention from where you installed leptonica and tesseract.
>>
>> what info do you see when you type
>>
>> tesseract -v
>>
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Thu, Mar 16, 2017 at 2:21 PM, Kazi Moinul Hossain 
>> wrote:
>>
>>> Should i reinstall leptonica & tesseract ?
>>>
>>> On Wednesday, 15 March 2017 23:59:19 UTC+6, zdenop wrote:

 It seems that your (leptonica?) installation is corrupted. Your example
 works for me (for 4.00 and 3.05):

 zdeno@level2:~/test> g++ sample.cpp -o sample -llept -ltesseract
 zdeno@level2:~/test> ./sample
 Tesseract-ocr version: 4.00.00alpha
 Leptonica version: leptonica-1.74


 Zdenko

 On Wed, Mar 15, 2017 at 4:28 PM, Kazi Moinul Hossain <
 moinj...@gmail.com> wrote:

> Hi everyone,
>
> I have installed leptonica 1.74.1 and then tesseract 4.00.00.
> After that, i have created a C++ program stated below which ultimately
> will show the leptonica and tesseract version installed in the system.
>
>
>
> *#include
> #include#include
> *
>
> *int main() {*
>
>
>
>
>
> *tesseract::TessBaseAPI *myOCR = new
> tesseract::TessBaseAPI();printf("Tesseract-ocr version:
> %s\n",myOCR->Version());printf("Leptonica version:
> %s\n",getLeptonicaVersion());return 0;}*
>
> While compiling my code using following command
>
>
> *$ g++ sample.cpp -o
> sample -I/usr/local/include/leptonica -I/usr/local/include/tesseract 
> -llept
> -ltesseract*
>
> i am encountering following error,
>
> *//usr/local/lib/libtesseract.so: undeﬁned reference to
> ‘pixReadFromMultipageTiff ‘*
> *//usr/local/lib/libtesseract.so: undeﬁned reference to
> ‘pixReadMemFromMultipageTiff ‘*
> *collect2: error: Id returned 1 exit status*
>
> Again to mention, my tesseract version is 4.00.00 and leptonica
> version is 1.74.1.
> Can anyone please help me to figure it out?
>
> Thanks,
> Kazi Moinul Hossain
> Junior Software Developer
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tesseract-oc...@googlegroups.com.
> To post to this group, send email to tesser...@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/be3b1def-f66
> 6-446a-ad16-211b4e3a8523%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

 --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/3ffe6af9-1b4b-4ee2-b988-fc4dd9aaf446%40goo
>>> glegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-20 Thread ShreeDevi Kumar

you have not responded to zdenko's suggestion to provide output of

ldd tesseract

or

ldd /usr/local/bin/tesseract


(use the location of tesseract, which you can find by

which tesseract)

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Mar 20, 2017 at 4:53 PM, Kazi Moinul Hossain <moinjoje...@gmail.com>
wrote:

> I uninstalled all the previous leptonica and tesseract, installed all the
> new stuffs. When i am typing tesseract -v, it is showing properly the
> leptonica and tesseract version.
> But When i am typing, tesseract sample.tif sample, it is showing "Illegeal
> instruction (core dumped)".
> Ultimately, i became unable to compile the C++ program as mentioned here
> before.After compilation command, it is showing, "fatal error: baseapi.h:
> No such file or directory"
> What could be the soultion? Please help
>
> On Friday, 17 March 2017 19:41:20 UTC+6, shree wrote:
>>
>> sudo apt-get remove libleptonica-dev libleptonica
>>
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Fri, Mar 17, 2017 at 7:09 PM, ShreeDevi Kumar <shree...@gmail.com>
>> wrote:
>>
>>> try
>>>
>>> sudo apt-get remove libleptonica-dev
>>>
>>> ShreeDevi
>>> 
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Fri, Mar 17, 2017 at 6:06 PM, Kazi Moinul Hossain <moinj...@gmail.com
>>> > wrote:
>>>
>>>> how can i uninstall old leptonica fully? I am attempting with "sudo
>>>> apt-get autoremove leptonica" and it is showing "unable to locate package"
>>>>
>>>> On Friday, 17 March 2017 17:56:13 UTC+6, shree wrote:
>>>>
>>>>> I use the following batch files in the folders where I have cloned
>>>>> tesseract and leptonica.
>>>>>
>>>>> 1. leptonica
>>>>>
>>>>> #!/bin/bash
>>>>> git pull origin
>>>>> ./autobuild
>>>>> #./configure --disable-dependency-tracking
>>>>> ./configure
>>>>> make
>>>>> sudo make install
>>>>> sudo ldconfig
>>>>> cd prog
>>>>> make
>>>>> cd ..
>>>>>
>>>>> 2. tesseract
>>>>>
>>>>> #!/bin/bash
>>>>> ./autogen.sh
>>>>> ./configure
>>>>> LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
>>>>> sudo make install
>>>>> sudo ldconfig
>>>>> make training
>>>>> sudo make training-install
>>>>>
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/tesseract-ocr/14390a99-0423-46ff-9501-56bbeb41d794%40goo
>>>> glegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/14390a99-0423-46ff-9501-56bbeb41d794%40googlegroups.com?utm_medium=email_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/4a5b2bec-f3a2-4839-af9a-e208f1a313ac%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/4a5b2bec-f3a2-4839-af9a-e208f1a313ac%40googlegroups.com?utm_medium=email_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUMPR%2BG60bGx_i1MYUt-DoYys5rKmtLtzD1z%2B_PpLYBVw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Recognition of trademark symbol

2017-03-17 Thread ShreeDevi Kumar

Please see
https://github.com/tesseract-ocr/tesseract/issues/654#issuecomment-274574951
for more details about LSTM training.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Mar 13, 2017 at 8:35 PM, Martin Fadrhons 
wrote:

> Hi,
>
> I was trying to train tesseract 4 to recognize trademark symbol ™. I was
> following examples on wiki:
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---
> Replacing-Top-Layer-Example
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---
> Replace-Top-Layer
>
> I use German language for testing. With the traineddata from repository
> the trademark symbol is usually recognized as '" or some other variation of
> quotes. So I created training text that includes trademark symbol and
> started the training process. I replaced only the top layer as it is in the
> example, however the trademark symbol is still not recognized properly.
> With the newly generated traineddata the symbol is recognized as TM. I have
> several questions.
>
> 1. Is it needed to replace more layers?
> 2. How large should be the training text? (mine is based on the one that
> is in langdata/deu directory)
> 3. I noticed that there are symbols © and ®. Why is trademark symbol
> missing?
>
> Any other hints would be appreciated.
>
> Thank you for your time,
> Martin
>
> P.S. Also thanks for the great work on the tesseract OCR.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/8c1c548b-3c39-4622-99be-0bfbe5f486cb%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWpppKHEU%2BEiudoZgiKbEo-qd%2BZUvpB1zPN2TwEGVPgFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-17 Thread ShreeDevi Kumar

>Is there anything more you did in the "src" and "prog" directory under
leptonica folder like "make allheaders", "make xtractprotos"?

No.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUwG8Mf1MeLEhQc_XpXGo%3Dt493G5o%3DSv__KAqZesJkS2w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-17 Thread ShreeDevi Kumar

I use the following batch files in the folders where I have cloned
tesseract and leptonica.

1. leptonica

#!/bin/bash
git pull origin
./autobuild
#./configure --disable-dependency-tracking
./configure
make
sudo make install
sudo ldconfig
cd prog
make
cd ..

2. tesseract

#!/bin/bash
./autogen.sh
./configure
LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
sudo make install
sudo ldconfig
make training
sudo make training-install

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVFnxTDQs7TkgCYBpB%2BxhWuvRj9c-8hoOWz5W3B4V8nnA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-17 Thread ShreeDevi Kumar

Your problem could be that you have some old version of leptonica installed
on your system. Uninstall and remove all old versions and then try.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Mar 17, 2017 at 5:25 PM, ShreeDevi Kumar <shreesh...@gmail.com>
wrote:

> I use the following batch files in the folders where I have cloned
> tesseract and leptonica.
>
> 1. leptonica
>
> #!/bin/bash
> git pull origin
> ./autobuild
> #./configure --disable-dependency-tracking
> ./configure
> make
> sudo make install
> sudo ldconfig
> cd prog
> make
> cd ..
>
> 2. tesseract
>
> #!/bin/bash
> ./autogen.sh
> ./configure
> LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
> sudo make install
> sudo ldconfig
> make training
> sudo make training-install
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXJheZtT%3Du7r6w7z3LfZ214qj3A-GY2JzwKJ5Ud6y_DyQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-17 Thread ShreeDevi Kumar

Also you have not responded to zdenko's suggestion to provide output of

ldd tesseract

or

ldd /usr/local/bin/tesseract


(use the location of tesseract, which you can find by

which tesseract)

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVHYAwKx19dLaHTjephHv0vtWgeEOL8k8MEhTgXmRDHQw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-17 Thread ShreeDevi Kumar

try

sudo apt-get remove libleptonica-dev

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Mar 17, 2017 at 6:06 PM, Kazi Moinul Hossain 
wrote:

> how can i uninstall old leptonica fully? I am attempting with "sudo
> apt-get autoremove leptonica" and it is showing "unable to locate package"
>
> On Friday, 17 March 2017 17:56:13 UTC+6, shree wrote:
>
>> I use the following batch files in the folders where I have cloned
>> tesseract and leptonica.
>>
>> 1. leptonica
>>
>> #!/bin/bash
>> git pull origin
>> ./autobuild
>> #./configure --disable-dependency-tracking
>> ./configure
>> make
>> sudo make install
>> sudo ldconfig
>> cd prog
>> make
>> cd ..
>>
>> 2. tesseract
>>
>> #!/bin/bash
>> ./autogen.sh
>> ./configure
>> LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
>> sudo make install
>> sudo ldconfig
>> make training
>> sudo make training-install
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/14390a99-0423-46ff-9501-56bbeb41d794%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXqSxOSdpsv%2Bqe7_Gz3_F2i2Wshh1n4zAK5w4Favptxsw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-17 Thread ShreeDevi Kumar

sudo apt-get remove libleptonica-dev libleptonica


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Mar 17, 2017 at 7:09 PM, ShreeDevi Kumar <shreesh...@gmail.com>
wrote:

> try
>
> sudo apt-get remove libleptonica-dev
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Fri, Mar 17, 2017 at 6:06 PM, Kazi Moinul Hossain <
> moinjoje...@gmail.com> wrote:
>
>> how can i uninstall old leptonica fully? I am attempting with "sudo
>> apt-get autoremove leptonica" and it is showing "unable to locate package"
>>
>> On Friday, 17 March 2017 17:56:13 UTC+6, shree wrote:
>>
>>> I use the following batch files in the folders where I have cloned
>>> tesseract and leptonica.
>>>
>>> 1. leptonica
>>>
>>> #!/bin/bash
>>> git pull origin
>>> ./autobuild
>>> #./configure --disable-dependency-tracking
>>> ./configure
>>> make
>>> sudo make install
>>> sudo ldconfig
>>> cd prog
>>> make
>>> cd ..
>>>
>>> 2. tesseract
>>>
>>> #!/bin/bash
>>> ./autogen.sh
>>> ./configure
>>> LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
>>> sudo make install
>>> sudo ldconfig
>>> make training
>>> sudo make training-install
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/14390a99-0423-46ff-9501-56bbeb41d794%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/14390a99-0423-46ff-9501-56bbeb41d794%40googlegroups.com?utm_medium=email_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWHD7Y5CE-VvcUnS9U8urmJB2S0mSC7VqfRxXgswH-huA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] VietOCR 5.0 alpha availability

2017-04-03 Thread ShreeDevi Kumar

You need to get vietocr 5.0 alpha for tesseract 4.0 alpha

https://sourceforge.net/projects/vietocr/files/vietocr.net/5.0alpha/

https://sourceforge.net/projects/vietocr/files/vietocr/5.0alpha/

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Apr 3, 2017 at 2:52 PM, El Fakir Zakaria  wrote:

> this is using Tesseract 3.04 not 4.00alpha ?
>
> 2017-03-31 18:13 GMT+01:00 Quan Nguyen :
>
>> VietOCR 5.0 alpha, Java & .NET GUI frontend for Tesseract 4.00alpha, is
>> available for download. Any feedback is welcome. Thanks.
>>
>> https://sourceforge.net/projects/vietocr/files/
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/aa63499d-1375-4c08-bf1d-e87c00f9b8cd%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/CALjY3nP4%2BA68yvfyVXGdFQATTMkVc7BpQdk_
> 5VBgKQDMte-vKw%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW0aA_33v-PGCXJJ8_vOw_1iSz4OaXsf4st0Kf_9EdLRQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: train tesseract OCR 4.0

2017-04-04 Thread ShreeDevi Kumar

Read

https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Finetune

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replacing-Top-Layer-Example

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replace-Top-Layer

and

https://github.com/tesseract-ocr/tesseract/wiki/Documentation

https://github.com/tesseract-ocr/tesseract/wiki/Fonts

https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage

https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality

https://github.com/tesseract-ocr/tesseract/wiki/FAQ




ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Apr 5, 2017 at 12:54 AM,  wrote:

> Can you please post some experiences in this post, as there are no posts
> to train tesseract 4.
>
> 1)And also, is there any way to add the new trained data file to old
> trained data file, without replacing the old file.
> 2)If we dont know what font we may get in our images, then how should we
> proceed in training the tessract
>
> On Tuesday, April 4, 2017 at 9:27:06 PM UTC+5:30, Saurabh Srivastav wrote:
>>
>> Yes, i trained my tesseract for eng font and make them read the
>> characters from image.
>>
>>> thanks,
 Saurabh Srivastav

>>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/9c88494c-6d80-4b31-b247-dbbacd48bc19%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXONmbtzqrDoSf2JBEG1nSq8BxjQtpjh7w7OHTHnRHQjQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: train tesseract OCR 4.0

2017-04-05 Thread ShreeDevi Kumar

4.0 is alpha software. Please use an older released version.

- excuse the brevity, sent from mobile

On 05-Apr-2017 1:55 PM,  wrote:

> After u have said,
>
> I tried in two ways and i am stuck at lstm step:
>
> Training
>
> command used:
>
> /home/p/Documents/T/tesseract-master/training/lstmtraining -U
> /home/p/Documents/T/img_frm_3/eng.unicharset \
> >   --script_dir /home/p/Documents/T/TESS_4_ALPHA/langdata-master
> --debug_interval 100 \
> >   --net_spec '[1,36,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256
> O1c105]' \
> >   --model_output /home/p/Documents/T/ \
> >   --train_listfile /home/p/Documents/T/img_frm_3/eng.ArialBold.exp0.txt
> \
> >   --eval_listfile /home/p/Documents/T/img_frm_3/eng.ArialBold.exp0.txt \
> >   --max_iterations 5000 &>/home/p/Documents/T/basetrain.log
>
> tail -f basetrain.log
> Error getting is :
>
>
> Deserialize header failed: BnO. 005 SUBHISHIs TOWN CENTRE
> Deserialize header failed: MOKILA SHAKARPALLY
> Deserialize header failed: PHONE: 040-8989898989
> Load of page 0 failed!
> Load of images failed!!
> Deserialize header failed: TIN: 8989898989
> Deserialize header failed: Station 1D: 01 Time: 03:26:46 PM
> Deserialize header failed: CASHIER ID:; 3001 Date: 21-02-2017
> Deserialize header failed: (null)
> Deserialize header failed: (null)
>
>
>
>
>
>
>
>
> Fine tuning:
>
> command used:-
>
> /home/plianto/Documents/Tvat/tesseract-master/training/tesstrain.sh
> --fonts_dir /usr/share/fonts --lang eng --linedata_only \
>   --training_text /home/plianto/Documents/Tvat/
> img_frm_3/eng.ArialBold.exp0.txt \
>   --langdata_dir /home/plianto/Documents/Tvat/TESS_4_ALPHA/langdata-master
> --tessdata_dir /usr/share/tesseract-ocr/tessdata \
>   --fontlist "Arial Bold" \
>   --output_dir /home/plianto/Documents/Tvat/engoutput/
>
> error:
>
> === Phase E: Generating lstmf files ===
> Using TESSDATA_PREFIX=/usr/share/tesseract-ocr/tessdata
> [Wed Apr 5 13:53:05 IST 2017] /usr/local/bin/tesseract
> /tmp/tmp.KTk3WgBTWk/eng/eng.Arial_Bold.exp0.tif
> /tmp/tmp.KTk3WgBTWk/eng/eng.Arial_Bold.exp0 lstm.train
> read_params_file: Can't open lstm.train
> Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
> Page 1
> ERROR: /tmp/tmp.KTk3WgBTWk/eng/eng.Arial_Bold.exp0.lstmf does not exist
> or is not readable
>
>
>
>
>
>
>
>
>
> On Wednesday, April 5, 2017 at 9:07:40 AM UTC+5:30, shree wrote:
>>
>> Read
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTess
>> eract-4.00---Finetune
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTess
>> eract-4.00---Replacing-Top-Layer-Example
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTess
>> eract-4.00---Replace-Top-Layer
>>
>> and
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/Documentation
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/Fonts
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/FAQ
>>
>>
>>
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Wed, Apr 5, 2017 at 12:54 AM,  wrote:
>>
>>> Can you please post some experiences in this post, as there are no posts
>>> to train tesseract 4.
>>>
>>> 1)And also, is there any way to add the new trained data file to old
>>> trained data file, without replacing the old file.
>>> 2)If we dont know what font we may get in our images, then how should we
>>> proceed in training the tessract
>>>
>>> On Tuesday, April 4, 2017 at 9:27:06 PM UTC+5:30, Saurabh Srivastav
>>> wrote:

 Yes, i trained my tesseract for eng font and make them read the
 characters from image.

> thanks,
>> Saurabh Srivastav
>>
> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/9c88494c-6d80-4b31-b247-dbbacd48bc19%40goo
>>> glegroups.com
>>> 
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this

Re: [tesseract-ocr] Tesseract (4 alpha ) Amibiguos Situation while Correcting Chars in box file

2017-04-05 Thread ShreeDevi Kumar

Have you tried just using the eng.traineddata directly with tess 3.04/ 3.05
/ 4.0?

You don't need to train unless it is a very special case. You can try
changing the dictionary dawg files with tess 3.0x.




ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Apr 5, 2017 at 11:25 AM,  wrote:

> I am trying to correct box files, so i can train tesseract.
>
> But I have got strange problem,
>
>
> 1) Tesseract is recognizing some alphabet as two letters, then how to edit
> the box file then.. (screenshot 1).
> 2) Tesseract is not recognizing some alphabets so how to edit the box file
> then.. (screenshot 2).
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/8acd28ca-fa7f-4be6-a293-ec3008ffd288%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX5RSr0myJhivnXc50KzU0H5KN2Mghv6k6COkcp8%2BBELQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: train tesseract OCR 4.0

2017-04-05 Thread ShreeDevi Kumar

You do not have the LSTM.train config file.

- excuse the brevity, sent from mobile

On 05-Apr-2017 1:55 PM,  wrote:

> After u have said,
>
> I tried in two ways and i am stuck at lstm step:
>
> Training
>
> command used:
>
> /home/p/Documents/T/tesseract-master/training/lstmtraining -U
> /home/p/Documents/T/img_frm_3/eng.unicharset \
> >   --script_dir /home/p/Documents/T/TESS_4_ALPHA/langdata-master
> --debug_interval 100 \
> >   --net_spec '[1,36,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256
> O1c105]' \
> >   --model_output /home/p/Documents/T/ \
> >   --train_listfile /home/p/Documents/T/img_frm_3/eng.ArialBold.exp0.txt
> \
> >   --eval_listfile /home/p/Documents/T/img_frm_3/eng.ArialBold.exp0.txt \
> >   --max_iterations 5000 &>/home/p/Documents/T/basetrain.log
>
> tail -f basetrain.log
> Error getting is :
>
>
> Deserialize header failed: BnO. 005 SUBHISHIs TOWN CENTRE
> Deserialize header failed: MOKILA SHAKARPALLY
> Deserialize header failed: PHONE: 040-8989898989
> Load of page 0 failed!
> Load of images failed!!
> Deserialize header failed: TIN: 8989898989
> Deserialize header failed: Station 1D: 01 Time: 03:26:46 PM
> Deserialize header failed: CASHIER ID:; 3001 Date: 21-02-2017
> Deserialize header failed: (null)
> Deserialize header failed: (null)
>
>
>
>
>
>
>
>
> Fine tuning:
>
> command used:-
>
> /home/plianto/Documents/Tvat/tesseract-master/training/tesstrain.sh
> --fonts_dir /usr/share/fonts --lang eng --linedata_only \
>   --training_text /home/plianto/Documents/Tvat/
> img_frm_3/eng.ArialBold.exp0.txt \
>   --langdata_dir /home/plianto/Documents/Tvat/TESS_4_ALPHA/langdata-master
> --tessdata_dir /usr/share/tesseract-ocr/tessdata \
>   --fontlist "Arial Bold" \
>   --output_dir /home/plianto/Documents/Tvat/engoutput/
>
> error:
>
> === Phase E: Generating lstmf files ===
> Using TESSDATA_PREFIX=/usr/share/tesseract-ocr/tessdata
> [Wed Apr 5 13:53:05 IST 2017] /usr/local/bin/tesseract
> /tmp/tmp.KTk3WgBTWk/eng/eng.Arial_Bold.exp0.tif
> /tmp/tmp.KTk3WgBTWk/eng/eng.Arial_Bold.exp0 lstm.train
> read_params_file: Can't open lstm.train
> Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
> Page 1
> ERROR: /tmp/tmp.KTk3WgBTWk/eng/eng.Arial_Bold.exp0.lstmf does not exist
> or is not readable
>
>
>
>
>
>
>
>
>
> On Wednesday, April 5, 2017 at 9:07:40 AM UTC+5:30, shree wrote:
>>
>> Read
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTess
>> eract-4.00---Finetune
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTess
>> eract-4.00---Replacing-Top-Layer-Example
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTess
>> eract-4.00---Replace-Top-Layer
>>
>> and
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/Documentation
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/Fonts
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/FAQ
>>
>>
>>
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Wed, Apr 5, 2017 at 12:54 AM,  wrote:
>>
>>> Can you please post some experiences in this post, as there are no posts
>>> to train tesseract 4.
>>>
>>> 1)And also, is there any way to add the new trained data file to old
>>> trained data file, without replacing the old file.
>>> 2)If we dont know what font we may get in our images, then how should we
>>> proceed in training the tessract
>>>
>>> On Tuesday, April 4, 2017 at 9:27:06 PM UTC+5:30, Saurabh Srivastav
>>> wrote:

 Yes, i trained my tesseract for eng font and make them read the
 characters from image.

> thanks,
>> Saurabh Srivastav
>>
> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/9c88494c-6d80-4b31-b247-dbbacd48bc19%40goo
>>> glegroups.com
>>> 
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-04 Thread ShreeDevi Kumar

See

https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh

https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain_utils.sh

https://github.com/tesseract-ocr/tesseract/blob/master/training/language-specific.sh

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV99at4Uzvyk4HxxMONL%3DB51V-MV7GS8HNk11ziqkD5xQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-04 Thread ShreeDevi Kumar

Tesstrain.sh generates a file called eng.training_files.txt

You are using command without .text extension

Check the name of generated file and use that.

I have found that editing that file also gives errors.
- excuse the brevity, sent from mobile

On 04-Apr-2017 7:01 PM,  wrote:

> I am trying to tesseract 4,, and i am getting folowing error,,
>
> command used:
>
> mkdir -p /home/p/Documents/T/engoutput
> /home/p/Documents/T/tesseract-master/training/lstmtraining -U
> /home/p/Documents/T/img_frm_3/unicharset \
>   --script_dir /home/p/Documents/T/TESS_4_ALPHA/langdata-master
> --debug_interval 100 \
>   --train_listfile /home/p/Documents/T/TESS_4_
> ALPHA/langdata-master/eng/eng.training_files \
>   --eval_listfile /home/p/Documents/T/TESS_4_
> ALPHA/langdata-master/eng/eng.training_files \
>   --max_iterations 5000 &>/home/p/Documents/T/basetrain.log
>
> used for log:
> tail -f basetrain.log
> Failed to load list of training filenames from /home/p/Documents/T/TESS_4_
> ALPHA/langdata-master/eng/eng.training_files
> tail: basetrain.log: file truncated
>
>
>
> error getting:
> Failed to load list of training filenames from /home/p/Documents/T/TESS_4_
> ALPHA/langdata-master/eng/eng.training_files
>
>
>
>
> On Tuesday, April 4, 2017 at 6:23:33 PM UTC+5:30, shree wrote:
>>
>> See
>>
>> https://github.com/tesseract-ocr/tesseract/blob/master/train
>> ing/tesstrain.sh
>>
>> https://github.com/tesseract-ocr/tesseract/blob/master/train
>> ing/tesstrain_utils.sh
>>
>> https://github.com/tesseract-ocr/tesseract/blob/master/train
>> ing/language-specific.sh
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/77c03857-e090-4a68-9cb9-505ff9ba52d4%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVNi1K8LRrtHv0fGvWJysn--OSStW932s%2BiRYFPX8L3qw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-03 Thread ShreeDevi Kumar

Saurabh,

It depends on what you want to do with the bash script.

Here is a sample of a script I used to compare results using diff tessdata
files by looping thru a set of image files. Google the bash commands to
figure out what they do!

#!/bin/bash
set -vx
export TESSDATA_PREFIX=/mnt/c/Users/User/shree/tesseract-ocr

img_files=$(ls *.jpeg)
for img_file in ${img_files}; do
time tesseract ${img_file} ${img_file%.*}-ssd  -l ssd
time tesseract ${img_file} ${img_file%.*}-ssdsmall  --psm 6 --oem 1
-l ssdsmall
time tesseract ${img_file} ${img_file%.*}-eng  --psm 6 --oem 1 -l
eng
done


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Apr 3, 2017 at 7:10 PM, Saurabh Srivastav <
saurabhkumarsrivas...@gmail.com> wrote:

> hello  shree ! thank you for your help.
> may you please help me how can i write a bash  script for tesseract.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/ac53f578-d14c-401b-b65e-b222fe4cb067%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWM5M%2BnQ%3Dbg_3EV%2Bbj6ViXYVCMgNWprQA6uwWr3vzdGuw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Error while creating training data for Japanese

2017-04-03 Thread ShreeDevi Kumar

jpn.config in langdata/jpn is loading jpn_vert as a sublanguage

tessedit_load_sublangs jpn_vert

You can try without that

Also look at the settings for jpn in training/language_specific.sh

You may need to change the following also ..


# The following fonts will be rendered vertically in phase I.
VERTICAL_FONTS=( \
"TakaoExGothic" \ # for jpn
"TakaoExMincho" \ # for jpn
"AR PL UKai Patched" \ # for chi_tra
"AR PL UMing Patched Light" \ # for chi_tra
"Baekmuk Batang Patched" \ # for kor
)


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Apr 3, 2017 at 4:22 PM,  wrote:

> Hi,
>
> I'm trying to creating training data for Japanese (jpn.traineddata).
>
> I run 'tesstrain.sh' with '--linedataonly' option, and the script has
> finished ( return code 0 ) .
> But log file has contained some error messages ( repeated 22 times ).
>
> ```
> $ ../tesseract-ocr/training/tesstrain.sh --fonts_dir /usr/share/fonts
> --lang jpn --linedata_only   --noextract_font_properties --langdata_dir
> ../langdata   --tessdata_dir /usr/local/share --output_dir ~/work/jpntrain
> ```
>
>
> ---
> [Sun Apr 2 07:42:30 UTC 2017] /usr/local/bin/tesseract
> /tmp/tmp.pwcwGMb5hs/jpn/jpn.IPAPMincho.exp0.tif
> /tmp/tmp.pwcwGMb5hs/jpn/jpn.
> IPAPMincho.exp0 lstm.train ../langdata/jpn/jpn.config
> [Sun Apr 2 07:42:30 UTC 2017] /usr/local/bin/tesseract
> /tmp/tmp.pwcwGMb5hs/jpn/jpn.IPAGothic.exp0.tif
> /tmp/tmp.pwcwGMb5hs/jpn/jpn.I
> PAGothic.exp0 lstm.train ../langdata/jpn/jpn.config
> Error opening data file /usr/local/share/tessdata/jpn_vert.traineddata
> Please make sure the TESSDATA_PREFIX environment variable is set to the
> parent directory of your "tessdata" directory.
> Failed loading language 'jpn_vert'
> ---
>
> It seems that 'tesstrain.sh' requires 'jpn_vert.traineddata`, but this
> file not provide on tessdata repository.
>
> How I get this file? Or, Can I substitute  'jpn.traineddata' for
>  'jpn_vert.traineddata' ?
>
>
> I've found that there is `jpn_vert' directory on langdata repository, but
> only some config files.
>
>
> Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/c776398d-0b2f-483d-a9ec-63476eaf0586%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUXiMCsyMXtaV-mBiq1E1OhJqV-obaMHLkizjnivUMtiQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Low Accurate ini bold font

2017-03-31 Thread ShreeDevi Kumar

Did you build it with debug option?

That number refers to the git revision of the code, so it is easy to know
what version of source commit it refers to.

Look in github for commit that begins with that number.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Mar 31, 2017 at 9:34 AM, afrizal firdaus 
wrote:

> Thank you Shree.
>
> I just try it and that solve my problem :)
>
> But i have something weird in my tesseract.
> While i type 'tessearct -v' i am getting following output
>
> tesseract 362b68e
>  leptonica-1.74.1
>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
> 1.2.8
>
>  Found AVX
>  Found SSE
>
> Why the version is "tesseract 362b68e"? is it normal?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/d0ecd4e6-41a7-49ee-a62b-3d31cf361d0a%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUCFVuwZ0_F2cjvEz%2BjVQP0EmBymehCoEjaj%3Dr60%2BTrEQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: Tesseract (4 alpha ) Amibiguos Situation while Correcting Chars in box file

2017-04-12 Thread ShreeDevi Kumar

You can use jtessboxeditor to edit the box files. Make sure to mark EOL if
you are trying to train using scanned images.

Also note that this part of code is untested - training 4.0 using
pre-existing images and box files.

Ray has only explained method for using images created by text2image.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Apr 12, 2017 at 3:23 PM,  wrote:

> Can you please tell me how to split box and and merge two boxes
> respectively. I am not able to find any options regarding this. If you
> specify, it will be helpful to me and others also.
>
> Thank You.
>
> On Tuesday, April 11, 2017 at 9:10:14 AM UTC+5:30, Quan Nguyen wrote:
>>
>> For Case 1, you'll need to merge the two boxes. For Case 2, you'll
>> correct by splitting the box.
>>
>> On Wednesday, April 5, 2017 at 12:55:37 AM UTC-5, srn...@gmail.com wrote:
>>>
>>> I am trying to correct box files, so i can train tesseract.
>>>
>>> But I have got strange problem,
>>>
>>>
>>> 1) Tesseract is recognizing some alphabet as two letters, then how to
>>> edit the box file then.. (screenshot 1).
>>> 2) Tesseract is not recognizing some alphabets so how to edit the box
>>> file then.. (screenshot 2).
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/66aa5a58-da85-4cfd-b030-5f1857c95754%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWTWQCseiapd71vFd-ZwX5ZcKnLNXgU%3DOr3jXWLTu%3DEhw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Help in TrainingTesseract 4.00 Finetune

2017-04-12 Thread ShreeDevi Kumar

--linedata-only means that it will only try to create lstmf files and not
the files for 3.0x traing

- excuse the brevity, sent from mobile

On 12-Apr-2017 10:39 AM, "Ahmad Moawad"  wrote:

> Hello All,
>
> I want help in trainingTesseract 4.00 Finetune
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---
> Finetune
> I want to know some parameter such as:
>
> 1- langdata_dir is that the file in https://github.com/tesseract-ocr/langdata
>
> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang ara  
> --linedata_only \
>   --training_text ../langdata/ara/arabic1.txt \
>   --langdata_dir ../langdata --tessdata_dir ./tessdata \
>   --fontlist "Times New Roman," \
>   --output_dir ~/tesstutorial/aratest
> 2- lineddata_only unkown
>
> Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/7d0d9371-bbd4-4245-b415-4f67e8dfb9bb%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUyZ2Ewtm_tLFsQjjGXHd9tROvoxTrS4BNFxn8MSqhjiw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar

Lstm training is not like legacy training. Please read the wiki pages
regarding 4.0 training. I have given all sample commands there. There are 3
different ways of training.

Read the bash scripts regarding training to know more.

tesstrain.sh with --linedata-only creates the box tiff pairs but only the
lstmf file is saved in output dir.

Without --linedata-only you will get 3.0 traineddata.

There are multiple steps to be done using the lstmf files to create the
final 4.0 traineddata.

Since you want to write a tutorial, please do your own reading and trials
first


- excuse the brevity, sent from mobile

On 12-Apr-2017 4:08 PM,  wrote:

> Sorry, I have given wrong commands for arabic. Actually i was referring to
> english.
>
> tesseract eng.arial.exp4.tif eng.arial.exp4 nobatch box.train
> unicharset_extractor eng.arial.exp4.box
> echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations
> about the font
> mftraining -F font_properties -U unicharset -O eng.unicharset eng.arial.
> exp4.tr
> shapeclustering -F unicharset eng.arial.exp4.tr
> cntraining eng.arial.exp4.tr
>
> mv inttemp eng.inttemp
> mv normproto eng.normproto
> mv pffmtable eng.pffmtable
> mv shapetable eng.shapetable
> combine_tessdata eng.
>
>
>  I request you to suggest the changes for the below commands with respect
> to tesseract 4.0 , these commands are for tess 3.0.
> Please suggest changes for the above steps. I plan to publish a rigorous
> explanative tutorial after getting overview of all the steps.
> Thank you.
>
>
>
>
>
>
> On Wednesday, April 12, 2017 at 4:04:42 PM UTC+5:30, shree wrote:
>>
>> Arabic was never trained with the legacy tesseract engine and I doubt you
>> will get any improvement over existing traineddata using cube or lstm.
>>
>> You are free to experiment and see what you come up with.
>>
>> I have pointed to the bash scripts for training. Please refer to them for
>> the correct process.
>>
>> - excuse the brevity, sent from mobile
>>
>> On 12-Apr-2017 4:00 PM,  wrote:
>>
>>> Hello shree, Thank you for your valuable reply.. Are there any changes i
>>> need to follow for the steps below.. I request you to suggest the changes
>>> for the below commands, these are for tess 3.0
>>>
>>> tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train
>>> unicharset_extractor ara.arial.exp4.box
>>> echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations
>>> about the font
>>> mftraining -F font_properties -U unicharset -O ara.unicharset ara.arial.
>>> exp4.tr
>>> shapeclustering -F unicharset ara.arial.exp4.tr
>>> cntraining ara.arial.exp4.tr
>>>
>>> mv inttemp ara.inttemp
>>> mv normproto ara.normproto
>>> mv pffmtable ara.pffmtable
>>> mv shapetable ara.shapetable
>>> combine_tessdata ara.
>>>
>>>
>>> Please suggest changes for the above steps. I plan to publish a rigorous
>>> explanative tutorial after getting overview of all the steps.
>>> Thank you.
>>>
>>>
>>> On Wednesday, April 12, 2017 at 3:38:11 PM UTC+5:30, shree wrote:

 see https://github.com/tesseract-ocr/tesseract/blob/master/
 training/tesstrain.sh


 if ((LINEDATA)); then
   phase_E_extract_features "lstm.train" 8 "lstmf"
   make__lstmdata
 else
   phase_E_extract_features "box.train" 8 "tr"
   phase_C_cluster_prototypes "${TRAINING_DIR}/${LANG_CODE}.normproto"
   if [[ "${ENABLE_SHAPE_CLUSTERING}" == "y" ]]; then
   phase_S_cluster_shapes
   fi
   phase_M_cluster_microfeatures
   phase_B_generate_ambiguities
   make__traineddata
 fi

 

 lstm.train is for LSTM training

 box.train is for 3.0 Tesseract legacy engine training

 Please note that current master code is for alpha testing for 4.0 LSTM
 and will most probably drop support for legacy engine.

 If you want the legacy tesseract engine and train for it, please use
 the 3.05 branch of the github repo.

>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/70a9d13b-a28b-4e6f-9c78-ec1c41361d96%40goo
>>> glegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar

Read the bash scripts in

tesstrain.sh
tesstrain_utils.sh
language_specific.sh

In training directory

To understand more detail about lstm training

- excuse the brevity, sent from mobile

On 12-Apr-2017 10:47 AM, "Ahmad Moawad"  wrote:

> this is the part from https://github.com/tesseract-ocr/tesseract/wiki/
> TrainingTesseract-4.00
>
> My question related to the image part not making training from text
>
>
> The overall training process is similar to training 3.04
> 
> Conceptually the same:
>
>1. Prepare training text.
>
> 
>2. Render text to image + box file. (Or create hand-made box files for
>existing image data.)
>3. Make unicharset file.
>4. Optionally make dictionary data.
>5. Run tesseract to process image + box file to make training data set.
>6. Run training on training data set.
>7. Combine data files.
>
> Are the above steps similar to:
>
> tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train
> unicharset_extractor ara.arial.exp4.box
> echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations
> about the font
> mftraining -F font_properties -U unicharset -O ara.unicharset ara.arial.
> exp4.tr
> shapeclustering -F unicharset ara.arial.exp4.tr
> cntraining ara.arial.exp4.tr
>
> mv inttemp ara.inttemp
> mv normproto ara.normproto
> mv pffmtable ara.pffmtable
> mv shapetable ara.shapetable
> combine_tessdata ara.
>
>
> Should I use these steps or not.
>
>
> The key differences are:
>
>- The boxes only need to be at the *textline level.* It is thus *far
>easier* to make training data from existing image data.
>- The .tr files are replaced by .lstmf data files.
>- Fonts *can and should be mixed freely* instead of being separate.
>- The clustering steps (mftraining, cntraining, shapeclustering) are
>replaced with a single slow lstmtraining step.
>
> for this part i don't a lot about it.
>
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/60029df8-2149-4f5a-8c3a-32e96c27ce79%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWWM_F9Epr0HQG_EU70dZRqcPFpyGOxupK93J%3DiqvS0cA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar

see
https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh


if ((LINEDATA)); then
  phase_E_extract_features "lstm.train" 8 "lstmf"
  make__lstmdata
else
  phase_E_extract_features "box.train" 8 "tr"
  phase_C_cluster_prototypes "${TRAINING_DIR}/${LANG_CODE}.normproto"
  if [[ "${ENABLE_SHAPE_CLUSTERING}" == "y" ]]; then
  phase_S_cluster_shapes
  fi
  phase_M_cluster_microfeatures
  phase_B_generate_ambiguities
  make__traineddata
fi



lstm.train is for LSTM training

box.train is for 3.0 Tesseract legacy engine training

Please note that current master code is for alpha testing for 4.0 LSTM and
will most probably drop support for legacy engine.

If you want the legacy tesseract engine and train for it, please use the
3.05 branch of the github repo.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUfKtJ_Dyxt1RY4_MrpBExSOqbDGi_0sX3rSZzYuKeRzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar

Arabic was never trained with the legacy tesseract engine and I doubt you
will get any improvement over existing traineddata using cube or lstm.

You are free to experiment and see what you come up with.

I have pointed to the bash scripts for training. Please refer to them for
the correct process.

- excuse the brevity, sent from mobile

On 12-Apr-2017 4:00 PM,  wrote:

> Hello shree, Thank you for your valuable reply.. Are there any changes i
> need to follow for the steps below.. I request you to suggest the changes
> for the below commands, these are for tess 3.0
>
> tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train
> unicharset_extractor ara.arial.exp4.box
> echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations
> about the font
> mftraining -F font_properties -U unicharset -O ara.unicharset ara.arial.
> exp4.tr
> shapeclustering -F unicharset ara.arial.exp4.tr
> cntraining ara.arial.exp4.tr
>
> mv inttemp ara.inttemp
> mv normproto ara.normproto
> mv pffmtable ara.pffmtable
> mv shapetable ara.shapetable
> combine_tessdata ara.
>
>
> Please suggest changes for the above steps. I plan to publish a rigorous
> explanative tutorial after getting overview of all the steps.
> Thank you.
>
>
> On Wednesday, April 12, 2017 at 3:38:11 PM UTC+5:30, shree wrote:
>>
>> see https://github.com/tesseract-ocr/tesseract/blob/master/
>> training/tesstrain.sh
>>
>>
>> if ((LINEDATA)); then
>>   phase_E_extract_features "lstm.train" 8 "lstmf"
>>   make__lstmdata
>> else
>>   phase_E_extract_features "box.train" 8 "tr"
>>   phase_C_cluster_prototypes "${TRAINING_DIR}/${LANG_CODE}.normproto"
>>   if [[ "${ENABLE_SHAPE_CLUSTERING}" == "y" ]]; then
>>   phase_S_cluster_shapes
>>   fi
>>   phase_M_cluster_microfeatures
>>   phase_B_generate_ambiguities
>>   make__traineddata
>> fi
>>
>> 
>>
>> lstm.train is for LSTM training
>>
>> box.train is for 3.0 Tesseract legacy engine training
>>
>> Please note that current master code is for alpha testing for 4.0 LSTM
>> and will most probably drop support for legacy engine.
>>
>> If you want the legacy tesseract engine and train for it, please use the
>> 3.05 branch of the github repo.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/70a9d13b-a28b-4e6f-9c78-ec1c41361d96%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU4vx2rg0KdYqnxUjyhgJd4W1028P9S-5kK5S5OH77G9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: segmentation fault with tesseract 4

2017-04-12 Thread ShreeDevi Kumar

See https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage

Follow correct order of variables

  tesseract  imagename|stdin outputbase|stdout [options...] [configfile...]


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Apr 12, 2017 at 8:01 PM, Pritam Dodeja 
wrote:

> The command was the following:
>
> tesseract -l eng --oem 1 --psm 0 a.jpg stdout
>
> As far as where it occurred exactly, I can't tell.  I have been able to
> reproduce this with multiple jpgs - let me know if you need any further info
>
> tesseract --version shows
>
> tesseract 4.00.00alpha
> leptonica-1.74.1
> libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.2.54 : libtiff 4.0.6 : zlib
> 1.2.8
>
> Pritam
>
> On Wednesday, April 12, 2017 at 6:00:12 AM UTC-4, srn...@gmail.com wrote:
>>
>> Can u tell when did you got his, means with the usage of which command
>> did ypou get this error and at at which step..?
>>
>> On Wednesday, April 12, 2017 at 12:16:54 PM UTC+5:30, Pritam Dodeja wrote:
>>>
>>> Hi,
>>>
>>> I get segmentation faults when using page segmentation mode 0.  Has
>>> anyone else experienced this?
>>>
>>> Pritam
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/e9a62f9f-cf72-4081-8ace-695dd6e3cd53%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWh9rGo8KY1C0vC4Qc%2BJfpeXtUbxfJR0k%3DFGZ9eMhNo9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] How to add Armenian language support to tesseract

2017-04-11 Thread ShreeDevi Kumar

I have added this at https://github.com/tesseract-ocr/langdata/issues/67

Please add more information there:


Which language code - arm or hye

Modern Armenian or Classical Armenian

Sources for primary texts in unicode the Armenian language to use for
training

Freely available unicode fonts to render the text


Also read
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
and
https://github.com/tesseract-ocr/tesseract/issues/654#issuecomment-274574951

which talk about training process for 4.0 lstm.


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Apr 11, 2017 at 1:27 AM,  wrote:

> Dear all,
>
> I am trying tesseart recently and it is really a very good product. I
> would like to ask if there is any tutorial or steps about how we can add a
> new language support to the package? for example Armenian language.
>
> Thank you in advance.
>
>
> Regards,
>  Vahe
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/fa209638-0b54-4eb0-9260-6e377d3ce527%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWH5PjvPEED6D30FM1psfnpfE5Se2_K%2BRz4Pr2kYz48fg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Tesseract Installation

2017-04-11 Thread ShreeDevi Kumar

You can ignore it. I get it too when using sudo 2nd time.

Host name must be the id for your computer under windows10.

Have u tried running tesseract after that?

- excuse the brevity, sent from mobile

On 11-Apr-2017 4:10 PM, "Ibr" wrote:

Hi,

I'm trying to install the tesseract following the steps from this website

,i
ran the command for the step 5 all worked fine except the command *sudo
ldconfig *and it returned the error *sudo: unable to resolve host
DESKTOP-MEO8PSD*
Any idea, what is that error and how to solve it?
Thanks in advance
Note: I'm using windows 10 bash

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/
msgid/tesseract-ocr/4f07b521-348d-4a5f-a721-a3f40c3e998d%40googlegroups.com

.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWYwerXY8zLR_BLZK4mRcPftuy%3DAznxSKqRhEpiXYeNaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: Tesseract Installation

2017-04-11 Thread ShreeDevi Kumar

Also, if you want training tools, you need to build them separately - see
https://github.com/tesseract-ocr/tesseract/wiki/Compiling

make training
sudo make training-install


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Apr 11, 2017 at 6:53 PM, shree  wrote:

>
> On Tuesday, April 11, 2017 at 4:10:26 PM UTC+5:30, Ibr wrote:
>>
>>
>> Note: I'm using windows 10 bash
>>
>
> I use it too, but via mobaxterm, which makes it easier to use
>
> see http://mobaxterm.mobatek.net/download-home-edition.html
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/b44fb5fc-7cbc-4cd0-b1b5-b50238a982fb%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUr8bUsCtb08FMfBMecoJUVT47FKyS4c_MKtViXO5CyOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: segmentation fault with tesseract 4

2017-04-12 Thread ShreeDevi Kumar

Please open as issue, as problem related to --psm 0.

- excuse the brevity, sent from mobile

On 13-Apr-2017 9:29 AM, "Pritam Dodeja"  wrote:

> Find below - I can also ship my docker container to you if you want so you
> can see my exact setup, it's about 1.15GB
>
> Pritam
>
> On Wednesday, April 12, 2017 at 10:09:35 PM UTC-4, shree wrote:
>>
>> Which operating system - Ubuntu 16.10 Yakkety Yak on x86_64
>> Which version/commit of tesseract - top of Changelog says 2017-03-24 -
>> v4.00.00-alpha
>> How was tesseract built or - I compiled it from source
>> Where did u get the binaries
>>
>> Does it work with other psm values - yes, works with 3
>> Do you have the correct version of traineddata - tesseract --list-langs
>> works as expected, I got eng.traineddata from github, md5sum for that one
>> starts with 7af2
>>
>
>
>
>>
>> - excuse the brevity, sent from mobile
>>
>> On 12-Apr-2017 11:22 PM, "Pritam Dodeja"  wrote:
>>
>>> The command below also produces the same result ( segmentation fault )
>>>
>>> tesseract a.jpg stdout --oem 1 --psm 0 -l eng
>>>
>>> Pritam
>>>
>>> On Wednesday, April 12, 2017 at 10:56:09 AM UTC-4, shree wrote:

 See https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage

 Follow correct order of variables

   tesseract  imagename|stdin outputbase|stdout [options...] [configfile...]


 ShreeDevi
 
 भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

 On Wed, Apr 12, 2017 at 8:01 PM, Pritam Dodeja 
 wrote:

> The command was the following:
>
> tesseract -l eng --oem 1 --psm 0 a.jpg stdout
>
> As far as where it occurred exactly, I can't tell.  I have been able
> to reproduce this with multiple jpgs - let me know if you need any further
> info
>
> tesseract --version shows
>
> tesseract 4.00.00alpha
> leptonica-1.74.1
> libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.2.54 : libtiff 4.0.6 :
> zlib 1.2.8
>
> Pritam
>
> On Wednesday, April 12, 2017 at 6:00:12 AM UTC-4, srn...@gmail.com
> wrote:
>>
>> Can u tell when did you got his, means with the usage of which
>> command did ypou get this error and at at which step..?
>>
>> On Wednesday, April 12, 2017 at 12:16:54 PM UTC+5:30, Pritam Dodeja
>> wrote:
>>>
>>> Hi,
>>>
>>> I get segmentation faults when using page segmentation mode 0.  Has
>>> anyone else experienced this?
>>>
>>> Pritam
>>>
>> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tesseract-oc...@googlegroups.com.
> To post to this group, send email to tesser...@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/e9a62f9f-cf7
> 2-4081-8ace-695dd6e3cd53%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

 --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/1c083969-4731-4703-a35f-318b11179211%40goo
>>> glegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/fa4dc4fb-3cb3-45d3-b2dc-6d43df691b36%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr"

Re: [tesseract-ocr] Training tesseract-ocr unicharset_extractor, mftraining, cntraining

2017-04-21 Thread ShreeDevi Kumar

If you want to OCR an invoice like the sample you posted, just use the
eng.traineddata and OCR the page. You do not need to do any training.

Here is the output I get



8633 0410 NO RP 11 07122015 NYNN 01 01 0001 Page 2 Of 3


Did you know?


Your Comcast Business Internet

service gives you access to millions

of WiFi hotspots with the fastest WiFi

and even more coverage. Find out

more at businesscomcast.com/wiﬁ.



Need help? We’re here for you.


9 Visit business.comcast.com/help

Call 1-800—391 -3000

A


Billing support

Open 6 am-9 pm MTN, Mon through Fri

and 7 am—8 pm Sat


Technical support

Open 24 hours, 7 days a week



Did you know?


Never miss a payment with text alerts.

Receive text message reminders when your

bill is ready to pay or past due. Sign up at

business.comcast.com/myaccount.



Your bill is ready




Please notify us immediately with any

questions regarding charges billed to your

account. Comcast will issue a credit or

refund for any verified billing error which is

brought to our attention within sixty (60) days

of the bill.


ll


Additional payment options Moving? Let us help.


Automatic payment

Sign up at business.comcast.com/myaccount


a Oniine


Visit business.comcast.com/myaccount


a By phone

Call 1-800-391 -3000


if you're moving, give us as much

advanced notice as possible so we

can help make a smooth transition.


Call 1 -800-391 -3000


|||ll




ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Apr 21, 2017 at 11:34 AM, Alain Ghawi  wrote:

> Hello all,
>
> I am surprised by how many people tell me that tesseract is the best
> open-source OCR tool but yet there is no video explaining step-by-step the
> problems that you can encounter, or a good explanation and documentation
> for OCR.
>
> Well even though, everyone loves challenges! So here's the challenge I
> faced. I brought many pdf files that are invoices and I want to train
> tesseract to be able to ocr them as scanned images.
> So first of all, I transformed these pdf files into tif files
> using: magick -density 300 -depth 4   2151.pdf -background white -fill
> white -alpha Off  2151%d.tif
> This is ImageMagick. Nothing important here other than we have a 300 dpi
> image with an alpha channel off.
>
> You must rename them so : rename .tif files to:
> [lang].[name_font].exp0.tif (com.test_font.exp0.tif) This is for my example
>
> Great! After this step you must create your box file right? So I simply
> called:
> tesseract com.test_font.exp0.tif com.test_font.exp0 batch.nochop makebox
> tesseract com.test_font.exp0.tif com.test_font.exp1 batch.nochop makebox
>
> Then I fixed my files with CowBoxEditor as I wasn't finding the famous
> jTessBoxEditor online (weird right?) which did the job.
>
> After that, I created my .tr files:
> tesseract com.test_font.exp0.tif com.test_font.exp0 nobatch box.train
> tesseract com.test_font.exp1.tif com.test_font.exp1 nobatch box.train
>
> And here comes the surprises!!!
> After having your .tr files you call unicharset_extractor.
> First question: Why the glyph metrics are all 0,255,0,255,0,0,0,0,0,0?
> Which is wrong according to the documentation: https://github.
> com/tesseract-ocr/tesseract/blob/a3ba11b030345d32829b1e8355afea
> 5419978d82/doc/unicharset.5.asc
> Second question: Should I write a box file, then the other or combine
> them? Option 1: unicharset_extractor com.test_font.exp0.box   or Option 2:
> unicharset_extractor com.test_font.exp0.box com.test_font.exp1.box
> Third question: set_unicharset_extractor why should I use it? It doesn't
> fix the metrics only specify if Latin or Common! Link: https://github.com/
> tesseract-ocr/tesseract/issues/318
>
> After all these unanswered questions, I used mftraining and cntraining (no
> problems). Finally, I renamed my inttemp, normproto, pffmtable, shapetable
>  and I combined them using combine_tessdata com.
>
> Final question: If I named com.inttemp1 com.inttemp2 does it work? Same
> for shapetable, normproto, pffmtable
>
> I think these questions are asked more than once by all new users to
> tesseract. Please if any expert in tesseract can answer these questions it
> will be a great help for all the community.
> Kindly find the attached 2 tif files and the boxes generated.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/beb558f3-d52c-4eca-a668-501a9804ffb0%
> 40googlegroups.com
>

Re: [tesseract-ocr] Re: Standalone Self-contained Tesseract-OCR for Mac

2017-04-18 Thread ShreeDevi Kumar

I haven't built 3.05 so cannot help. I would suggest that you try with
older commits of tesseract 3.05 branch to see which one works.

Hope that those who have built 3.05 on mac will help.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW9LQh6tyE-UANqtV%2B%2Bh%2BBNKsauitXR8R-BacHu52xhTw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: Tesseract Installation

2017-04-19 Thread ShreeDevi Kumar

You can check that these are installed by entering the following

which text2image

The above will show u the location it is installed

If you don't have  training tools, you will need to build them separately -
see https://github.com/tesseract-ocr/tesseract/wiki/Compiling

make training
sudo make training-install

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUQ_FhVaGzbibJAxKfEL0M-MMZyjTuvcLMTR13RH%2B2YMw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: issue with simple reading of numbers 9 and 8

2017-04-23 Thread ShreeDevi Kumar

James,

Were you able to get this to work for you with 3.04/3.05?

I get accurate results using Tesseract 4.0 alpha, though it takes longer
with --oem 1 than --oem 0.


./troublewith98-300.jpg
Tesseract Open Source OCR Engine v4.00.00alpha-385-gab41465 with Leptonica

real0m1.203s
user0m0.578s
sys 0m0.203s
Tesseract Open Source OCR Engine v4.00.00alpha-385-gab41465 with Leptonica

real0m4.485s
user0m5.125s
sys 0m0.234s

See attached ..

You can test with
https://sourceforge.net/projects/vietocr/files/vietocr.net/5.0alpha/
which uses Tesseract.NET (Tesseract 4.00alpha 362b68e)


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sun, Apr 23, 2017 at 9:25 AM, ShreeDevi Kumar <shreesh...@gmail.com>
wrote:

> Try training using more samples of 8, 9, B etc.
>
> What results do you get with the provided eng.traineddata?  Are they
> better or worse?
>
> Have you tried changing DPI of image to 300?
>
> - excuse the brevity, sent from mobile
>
> On 22-Apr-2017 10:29 PM, "James Abney" <abne...@gmail.com> wrote:
>
>> Oh yes I guess I forgot to include that information, I did train using
>> only that font and with the same size font. I am on windows 7 and I used
>> 3.05 to train, although the .net wrapper i use is 3.04. I don't see how it
>> has difficulty with the 9 and 8, seems very odd.
>>
>> On Friday, April 21, 2017 at 11:05:49 PM UTC-4, shree wrote:
>>>
>>> Which version of Tesseract. Which o/s?
>>>
>>> If all your text is in tungsten-semibold, have you tried training with
>>> just that font?
>>>
>>> - excuse the brevity, sent from mobile
>>>
>>>
>>> On 22-Apr-2017 12:50 AM, "James Abney" <abn...@gmail.com> wrote:
>>>
>>> The font is tungsten semibold
>>>
>>>
>>> On Friday, April 21, 2017 at 2:08:53 PM UTC-4, James Abney wrote:
>>>>
>>>> I'm having issues with tesseract dealing with the number 9 and 8
>>>> especially when they are next to each other. This is really the only issue
>>>> I have. Even when ocr a tiff file it shows 123456789 as 123456788. I will
>>>> link an example. Any help is appreciated. The following image is an example
>>>> where my software using tesseract interprets the 899B8993B as 8-838.
>>>>
>>>>
>>>> <https://lh3.googleusercontent.com/-HF3RzbqMD6I/WPo8RYC6GaI/AJg/phkq6dgtvSE5f3upJQrfowEp1vyW8TQXwCLcB/s1600/troublewith98.png>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/4a0c2a52-3eb5-4884-9371-111a6fbea73b%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/4a0c2a52-3eb5-4884-9371-111a6fbea73b%40googlegroups.com?utm_medium=email_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/414a0ab1-8b9a-48a6-8571-795345ac316f%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/414a0ab1-8b9a-48a6-8571-795345ac316f%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVOcgryCqD77SZgHKDuJqgGCQmW9U9zFdgOoG8HT%2BHK3Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
B-s38

899B8993B

 

B-838

899889938

 

899B8993B

899889938

Re: [tesseract-ocr] Re: issue with simple reading of numbers 9 and 8

2017-04-22 Thread ShreeDevi Kumar

Try training using more samples of 8, 9, B etc.

What results do you get with the provided eng.traineddata?  Are they better
or worse?

Have you tried changing DPI of image to 300?

- excuse the brevity, sent from mobile

On 22-Apr-2017 10:29 PM, "James Abney"  wrote:

> Oh yes I guess I forgot to include that information, I did train using
> only that font and with the same size font. I am on windows 7 and I used
> 3.05 to train, although the .net wrapper i use is 3.04. I don't see how it
> has difficulty with the 9 and 8, seems very odd.
>
> On Friday, April 21, 2017 at 11:05:49 PM UTC-4, shree wrote:
>>
>> Which version of Tesseract. Which o/s?
>>
>> If all your text is in tungsten-semibold, have you tried training with
>> just that font?
>>
>> - excuse the brevity, sent from mobile
>>
>>
>> On 22-Apr-2017 12:50 AM, "James Abney"  wrote:
>>
>> The font is tungsten semibold
>>
>>
>> On Friday, April 21, 2017 at 2:08:53 PM UTC-4, James Abney wrote:
>>>
>>> I'm having issues with tesseract dealing with the number 9 and 8
>>> especially when they are next to each other. This is really the only issue
>>> I have. Even when ocr a tiff file it shows 123456789 as 123456788. I will
>>> link an example. Any help is appreciated. The following image is an example
>>> where my software using tesseract interprets the 899B8993B as 8-838.
>>>
>>>
>>> 
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-oc...@googlegroups.com.
>> To post to this group, send email to tesser...@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/4a0c2a52-3eb5-4884-9371-111a6fbea73b%40googlegroups.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/414a0ab1-8b9a-48a6-8571-795345ac316f%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVC%3DTw0CjKNF7aNE%3DkQN-T_-U879u9NsMRZivFKmXL5jA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Caching in TrainLineRecognizer?

2017-03-10 Thread ShreeDevi Kumar

I have added it as an issue. Please see
https://github.com/tesseract-ocr/tesseract/issues/754

You may want to create a pull request, if you have a solution.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sun, Mar 5, 2017 at 7:00 AM, Jens Weibler  wrote:

> Hi,
>
> I'm new to tesseract and wondered why the lstm dataset creation for the
> training process has to write the file again and again in
> TrainLineRecognizer. I've seen 200MB/s IO on the disk while creating the
> training data set.
> As far I can see for the training case it would be sufficient to just load
> it once and write it at the end. The same applies to the box and tif file -
> but these are only read and not written...
>
>
> Thanks,
> Jens Weibler
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/ea35cf15-f53a-47f4-afdb-801e8745eb93%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV5G6ckkNC%2BQCgosjQ0Fwrpoz3h%2B4NGzmB9%2BA7e848hYQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Tesseract 4's LSTM classifier

2017-03-08 Thread ShreeDevi Kumar

The only public information regarding LSTM that has been shared by
Google/Ray is linked from the following pages:

https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM

https://github.com/tesseract-ocr/docs/tree/master/das_tutorial2016

https://github.com/tesseract-ocr/tesseract/wiki/Technical-Documentation

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Mar 8, 2017 at 7:31 PM, Milan Troller 
wrote:

> Hello,
>
> I am resubmitting a question that's been denied from the tesseract-dev
> list.
> Simply put, I am interested in exploring the actual structure (the layer
> layout) of the LSTM classifier
> which Tesseract 4 uses, ideally including some insight on previous designs
> that have been tested
> and the results their implementation achieved, yet very little of this
> informations seems to appear in
> the Wiki or the source documents.
>
> Can somebody please point me towards this kind of information?
>
> With best regards,
> Milan Troller
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/a6a47c6e-0c49-4094-bf8e-a7598d84d1ce%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW8Jxw%3D51wf9krxTD_oJf9UcKbMp1tP6hq50A9xUZ2osQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Major changes between stable 3.04.01 and 4.0

2017-03-02 Thread ShreeDevi Kumar

Also see
https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Mar 2, 2017 at 8:46 PM, ShreeDevi Kumar <shreesh...@gmail.com>
wrote:

> see
>
> https://github.com/tesseract-ocr/tesseract/blob/master/ChangeLog
>
> https://github.com/tesseract-ocr/tesseract/releases
>
> https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Thu, Mar 2, 2017 at 8:23 PM, Ashish Goel <goelk...@gmail.com> wrote:
>
>> Can anyone please throw some light on major differences between tesseract
>> 3.04 and 4.0?
>> Since last 4 months, I have been working on a framework using tesseract
>> 3.04.
>>
>> Is it worthwhile moving to 4.0 now? Will it improve OCR efficiency?
>>
>> Any suggestions will be highly appreciated.
>>
>> Regards,
>> Ashish
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/1b689ee6-8616-4830-b804-3b541af076c0%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/1b689ee6-8616-4830-b804-3b541af076c0%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWp8HZJSwAMWCd6HO49ScGX55GGoMRxP-7hRiXpPzwQPg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Major changes between stable 3.04.01 and 4.0

2017-03-02 Thread ShreeDevi Kumar

see

https://github.com/tesseract-ocr/tesseract/blob/master/ChangeLog

https://github.com/tesseract-ocr/tesseract/releases

https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Mar 2, 2017 at 8:23 PM, Ashish Goel  wrote:

> Can anyone please throw some light on major differences between tesseract
> 3.04 and 4.0?
> Since last 4 months, I have been working on a framework using tesseract
> 3.04.
>
> Is it worthwhile moving to 4.0 now? Will it improve OCR efficiency?
>
> Any suggestions will be highly appreciated.
>
> Regards,
> Ashish
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/1b689ee6-8616-4830-b804-3b541af076c0%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUbt5axSCewE2JR7GOfUJXFJ2h6gHwUOMdTvgWPS46PHg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-03-02 Thread ShreeDevi Kumar

screenshot of warning  means that your image does not have resolution info.
Your OCR output file should have been created.

Training 4.0 is not easy. Please see
https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Mar 3, 2017 at 12:17 PM, Saurabh Srivastav 
wrote:

> how to train tesseract 4.0. Please help me..
>
> thanks,
> Saurabh Srivastav
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/f1782fd1-97a1-40db-8ba0-f003052f39ae%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU1SPaExBDbRd9euitkCFpXo3v8tpShnpuXU8g%3DivGBhQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Tesseract 4.0 doesn't see the changes after Arabic traning

2017-04-08 Thread ShreeDevi Kumar

Arabic traineddata for 3.0x uses cube engine. Training process for that was
never shared. Now the cube engine has been removed for lstm 4.0, which is
still in alpha stage.

There is 4.0alpha traineddata for Arabic and you can train for it , but
accuracy is not great. Ray is doing another training with some changes for
tatweel etc for Arabic. Depending on results, the changes will be made to
Github.

Your best bet is to wait for next set of updates from Ray/Google and try
after that.

- excuse the brevity, sent from mobile

On 08-Apr-2017 12:09 PM, "Ahmad Moawad"  wrote:

> Hello All,
>
> I want to ask about the issue that I faced after making training for
> tesseract for *Arabic*, I have Ubuntu & Tesseract-ocr 3.04
> *Steps*:
>
>1. $ convert ara.arial.exp1.jpg ara.arial.exp1.tif
>2. $ tesseract ara.arial.exp1.tif ara.arial.exp1 -l ara batch.nochop
>makebox
>3. edit the boxes using Qt Box Editor 2.0 beta
>4. $ cp ara.traineddata /usr/share/tesseract-ocr/tessdata
>5. $ tesseract ara.arial.exp1.tif out -l ara
>
> When I run the 6th one I have a bad result and the tesseract engine
> doesn't see the changes that I made it through Qt Box Editor
> Any help!!
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/cbfcdb71-bd88-4eef-a39b-2a6197a56fce%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXaKAtOAidM9e_tGjH5Zmvw8QSJZE-p9%2BMeOr_ducw35A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Read 2 column Image Horizontally (line by line) rather than Vertically (column by column)

2017-04-06 Thread ShreeDevi Kumar

Have u tried --psm 6

- excuse the brevity, sent from mobile

On 06-Apr-2017 11:06 PM, "Mike Hall"  wrote:

> We have a C# .Net app that is using Tesseract to do Optical Character
> Recognition (OCR) on .tiff files.  I've attached a sample tiff file.
>
> We are then outputting the data to a text file.  However, Tesseract is
> reading the data in a Vertical fashion.  In my example image, it is reading
> the tiff as two columns of data and the data the data is being outputted
> from Tesseract like this:
>
> TYPE:
> DATE:
> Address:
> City:
> State:
> Owner:
> Owner Type:
> Acreage:
> Mortgage:
> 12345
> 2017-04-06
> 100 Main St.
> Some City
> Some State
> John Doe
> Primary
> 10.25
> Yes
>
> What we want is Tesseract to read the tiff file horizontally and have the
> output look like this:
>
> TYPE:
> 12345
> DATE:
> 2017-04-06
> Address:
> 100 Main St.
> City:
> Some City
> State:
> Some State
> Owner:
> John Doe
> Owner Type:
> Primary
> Acreage:
> 10.25
> Mortgage:
> Yes
>
> We've tried the various Page Sementation options for Tesseract, but they
> all produce the same result.
> Has anyone run into this same issue? Anybody have any ideas?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/790b41ef-f97f-4695-b7c8-1c68bdd1cd38%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU8hkX3L4zxvz%3DOqf5anHM%2BOXHxf_RoGsm8xP6G69sgxw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] (Advise needed) Command Output Fails and gives error in Tesseract 4 during fine tuning

2017-04-06 Thread ShreeDevi Kumar

You must be using an old version of traineddata which does not have LSTM.

- excuse the brevity, sent from mobile

On 07-Apr-2017 2:13 AM,  wrote:

> I am following this link https://github.com/tesseract-ocr/tesseract/wiki/
> TrainingTesseract-4.00---Finetune
>
> For genaerating the files for fine tuning
>
>
> command used (for Reference):
>
>  combine_tessdata -e ../tessdata/ara.traineddata \
>   ~/tesstutorial/aratuned_from_ara/ara.lstm
>
>
> command used (actual):
>
>
> cmd : /home/p/Documents/T/tesseract-master/training/combine_tessdata -e
> /usr/share/tesseract-ocr/tessdata/eng.traineddata \
> > /home/p/Documents/T/engoutput/eng.lstm
>
> error :
>
> Extracting tessdata components from /usr/share/tesseract-ocr/
> tessdata/eng.traineddata
> Not extracting /home/plianto/Documents/Tvat/engoutput/eng.lstm, since
> this component is not present
>
>
> cmd  : /home/p/Documents/T/tesseract-master/training/combine_tessdata -e
> /usr/share/tesseract-ocr/tessdata/eng.traineddata \
>
> error:
> >/home/p/Documents/T/engoutput/eng.*
> Extracting tessdata components from /usr/share/tesseract-ocr/
> tessdata/eng.traineddata
> TessdataManager can't determine which tessdata component is represented by
> lstmf
> tesseract::TessdataManager::TessdataTypeFromFileName( filename, ,
> _file):Error:Assert failed:in file tessdatamanager.cpp, line 269
> Segmentation fault (core dumped)
>
>
>
> I dont know why I am not able to extract the files, any body pls give me
> advice
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/5e6402f3-0ec2-4e52-b630-afa39fe0bfd6%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXPuNceRZ2pY0v5VbCsZiie5pGZfeakbGu6UvZjFVEUew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Read 2 column Image Horizontally (line by line) rather than Vertically (column by column)

2017-04-06 Thread ShreeDevi Kumar

Normally, for text output, the other config files should not impact.



- excuse the brevity, sent from mobile

On 07-Apr-2017 2:18 AM, "Mike Hall"  wrote:

> Yes, we are using the -psm 6 command line argument.  And it was not
> working.
>
> But I figured out the issue.
>
> Tesseract has a set of config files. Inside several of these config files
> (hocr, pdf, tsv, unlv) is the setting *tessedit_pageseg_mode*. This
> setting was set to 1 in all the config files.   Once I removed the
> *tessedit_pageseg_mode* parameter from the config files, our command line
> argument of -psm 6 worked.
>
> Alternatively, I did experiment with the config files.  When I changed the 
> *tessedit_pageseg_mode
> *setting to 6 in all the config files and ran Tesseract with the -psm 6
> command line argument, it also worked.
>
> Thanks
>
> On Thursday, April 6, 2017 at 1:12:18 PM UTC-5, shree wrote:
>
>> Have u tried --psm 6
>>
>> - excuse the brevity, sent from mobile
>>
>> On 06-Apr-2017 11:06 PM, "Mike Hall"  wrote:
>>
>>> We have a C# .Net app that is using Tesseract to do Optical Character
>>> Recognition (OCR) on .tiff files.  I've attached a sample tiff file.
>>>
>>> We are then outputting the data to a text file.  However, Tesseract is
>>> reading the data in a Vertical fashion.  In my example image, it is reading
>>> the tiff as two columns of data and the data the data is being outputted
>>> from Tesseract like this:
>>>
>>> TYPE:
>>> DATE:
>>> Address:
>>> City:
>>> State:
>>> Owner:
>>> Owner Type:
>>> Acreage:
>>> Mortgage:
>>> 12345
>>> 2017-04-06
>>> 100 Main St.
>>> Some City
>>> Some State
>>> John Doe
>>> Primary
>>> 10.25
>>> Yes
>>>
>>> What we want is Tesseract to read the tiff file horizontally and have
>>> the output look like this:
>>>
>>> TYPE:
>>> 12345
>>> DATE:
>>> 2017-04-06
>>> Address:
>>> 100 Main St.
>>> City:
>>> Some City
>>> State:
>>> Some State
>>> Owner:
>>> John Doe
>>> Owner Type:
>>> Primary
>>> Acreage:
>>> 10.25
>>> Mortgage:
>>> Yes
>>>
>>> We've tried the various Page Sementation options for Tesseract, but they
>>> all produce the same result.
>>> Has anyone run into this same issue? Anybody have any ideas?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/790b41ef-f97f-4695-b7c8-1c68bdd1cd38%40goo
>>> glegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/e56e8714-716a-4664-90c0-bb0f4217c46a%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUirqMstF7ANWq9AoCy6RK7-ZGkes-yWLvGAroUH4t%2Beg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: Standalone Self-contained Tesseract-OCR for Mac

2017-04-18 Thread ShreeDevi Kumar

Use latest version of leptonica - 1.74.1

https://github.com/DanBloomberg/leptonica

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Apr 17, 2017 at 8:18 PM, Peter Reid  wrote:

> I've done some further searching and found several versions of shell
> scripts that are supposed to generate a standalone version of Tesseract.
> However, they all fail at the last part of the process, namely building
> Tesseract itself!  The script builds the libraries for zlib (v1.2.8),
> libpng (v1.6.13), libjpeg (9b) and leptonica (v1.73), but fails with the
> following error:
>
>   checking for leptonica... yes
>   checking for pixCreate in -llept... no
>   configure: error: leptonica library missing
>
> I can't find a way to correct this!  Here's the config details that lead
> to this error:
>
> export CXXFLAGS="-I$BUILD_DIR/include -I$BUILD_DIR/include/libpng16
> -I$BUILD_DIR/include/leptonica -lpng -ljpeg -lz"
> export CPPFLAGS="-I$BUILD_DIR/include -I$BUILD_DIR/include/libpng16
> -I$BUILD_DIR/include/leptonica -lpng -ljpeg -lz"
> export LDFLAGS="-L$BUILD_DIR/lib"
> export LIBLEPT_HEADERSDIR="$BUILD_DIR/include/leptonica"
>
> ./configure --prefix=$TESSERACT_DIR --with-extra-libraries=$BUILD_DIR/lib
>
> [Note: I added the CXXFLAGS as well as the CPPFLAGS as I wasn't sure which
> was needed]
>
> I have attached the latest version of the shell script I'm using so you
> can see the context.
>
> Can anyone fix my script or tell me another way of generating a standalone
> version of Tesseract for the Mac?
>
> Thanks
>
>
> On Thursday, March 24, 2016 at 10:49:03 AM UTC, Peter Reid wrote:
>>
>> I have a standalone version of tesseract-ocr for Windows that can be run
>> from a folder located anywhere in the Windows filing system without having
>> to do an installation.  For the Mac the user has to install
>> HomeBrew/MacPort first and then tesseract-ocr afterwards.  This fixes
>> tesseract-ocr to particular parts of the OS X filing system, preventing it
>> from being relocated and used elsewhere on the Mac.
>>
>> I'm looking for a standalone/self-contained version of tesseract-ocr for
>> the Mac that can be located anywhere and can be run without requiring
>> installations.  Please can someone point to such a version of tesseract-ocr
>> or give instructions on how I can build one of these!
>>
>> Thanks
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/e6dbc1e0-1314-47e9-b76c-627db8b6afc4%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUi%3D0iawsuyf3FPfLNEw1vBFUEXj76ML2Km5N6e-aj%3Ddw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: Standalone Self-contained Tesseract-OCR for Mac

2017-04-18 Thread ShreeDevi Kumar

Please see https://github.com/tesseract-ocr/tesseract/wiki/Compiling


If you are building tesseract 4.0, you need Lept 1.74

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Apr 18, 2017 at 2:25 PM, Peter Reid  wrote:

> Hi ShreeDevi
>
> I have tried the latest version of Leptonica but I get numerous warnings
> (38 of them, mainly about implicit function definitions) and a fatal error
> 'endian.h' not found.  The build finishes saying that Leptonica has been
> built OK and its library appears in the lib folder.  However, when I try to
> build Tesseract, I get the following error:
>
> checking for leptonica... yes
> checking for pixCreate in -llept... no
> configure: error: leptonica library missing
> Configuration done, now Building
> make: Nothing to be done for `install'.
> Tesseract build failed. Exiting.
>
> So I'm not better off with the latest version.  At least with version 1.73
> I don't get the warnings and error messages when building Leptonica even
> though the Tesseract build fails.
>
> Thanks
>
> Peter
>
>
> On Thursday, March 24, 2016 at 10:49:03 AM UTC, Peter Reid wrote:
>>
>> I have a standalone version of tesseract-ocr for Windows that can be run
>> from a folder located anywhere in the Windows filing system without having
>> to do an installation.  For the Mac the user has to install
>> HomeBrew/MacPort first and then tesseract-ocr afterwards.  This fixes
>> tesseract-ocr to particular parts of the OS X filing system, preventing it
>> from being relocated and used elsewhere on the Mac.
>>
>> I'm looking for a standalone/self-contained version of tesseract-ocr for
>> the Mac that can be located anywhere and can be run without requiring
>> installations.  Please can someone point to such a version of tesseract-ocr
>> or give instructions on how I can build one of these!
>>
>> Thanks
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/a0bdea5e-9e44-4a0e-b343-e0322fffe9c3%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUpoccueMeEsXyaHjQ8NY3n-A-QRQEjeo0HM6YezgsU8Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] ERROR: Could not find training text file

2017-07-31 Thread ShreeDevi Kumar

add a line similar to following to your training command, pointing to where
you have your training text

  --training_text ../langdata/eng/eng.training_text \


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Jul 31, 2017 at 4:24 PM, Ava Nimaee  wrote:

> Hi . sorry I used this syntax:
> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng
> --linedata_only \
>   --noextract_font_properties --langdata_dir langdata \
>   --tessdata_dir tessdata \
>   --fontlist "Times New Roman," --output_dir engtrain
> Befor that i create boxfile and tif and Ucnicahset_output
> I clone langdata for tesseract v4.0
> but take this error:
>  === Phase I: Generating training images ===
> ERROR: Could not find training text file langdata/eng/eng.training_text
> i can't solve it and i don't know where should i put taining_text.txt
> actually it is a text file that i want train it.
> Thanks for attention.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/a141d688-bc59-4485-b7bc-66ac650ebfd8%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU_zLd1N7aSvfD%3D5wtX3%2BpOeBAnkTgmh47qcwaJfGUWPA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: Combining tessdata files Error opening unicharset file

2017-07-28 Thread ShreeDevi Kumar

You need to mv or rename the files with por. prefix

then when you use combine_tessdata command it will use all por. files to
create traineddata.

see
https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain_utils.sh

mv ${TRAINING_DIR}/inttemp ${TRAINING_DIR}/${LANG_CODE}.inttemp
mv ${TRAINING_DIR}/shapetable ${TRAINING_DIR}/${LANG_CODE}.shapetable
mv ${TRAINING_DIR}/pffmtable ${TRAINING_DIR}/${LANG_CODE}.pffmtable

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Jul 28, 2017 at 4:23 PM,  wrote:

> This my essay
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/5704cdbc-a6b9-4de7-8396-a39ced1f7331%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWqVySWU0vViL72twkeH%3DWMyYkJ22J06vSRb5PU56exCQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] "Can't encode transcript" error when using "lstmtraining" command with Tess4.0

2017-08-01 Thread ShreeDevi Kumar

Ray has uploaded new traineddata files in
https://github.com/tesseract-ocr/tessdata/tree/master/best

Why don't you first try recognition with that

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Aug 1, 2017 at 1:45 PM,  wrote:

> Hello, Shree:
>
> I'm sorry, but whether can I use more than one unicharset, such as chi_sim
> and eng and so on, to finetune the training?
> Maybe some special characters can be in other unicharsets. If I find
> it/them, maybe I will train my traineddata with more unicharsets, and the
> special characters will be encoded at that time.
>
> Thanks, and hope for your reply.
>
> 在 2017年7月25日星期二 UTC+8下午3:23:08，shree写道：
>>
>> That error is because some characters in your training text are not part
>> of the unicharset of chi_sim.
>>
>> You are trying finetune training which will give error. Replace top layer
>> will work.
>>
>> I suggest that you wait 2-3 weeks for Ray to upload new traineddata for
>> all languages.
>>
>> You can tell us if there are any specific characters missing from
>> existing traineddata .
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Tue, Jul 25, 2017 at 12:46 PM,  wrote:
>>
>>> Hello,
>>>
>>> I apply the command to train my own traineddata:
>>>
>>> lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned \
>>>   --continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \
>>>   --train_listfile ~/tesstutorial/chitest/chi.training_files.txt \
>>>   --eval_listfile ~/tesstutorial/chitest/chi.training_files.txt \
>>>   --target_error_rate 0.01
>>>
>>> An error appears by Tess4.0 that shown in the following img. The system 
>>> (Tess4.0) says "Can't encode transcript" for text content such as 
>>> "化简（-x2）3的结果是...".
>>> Why? Can you help me?
>>>
>>>
>>> 
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/e2e1d749-a55d-4355-b128-5d0fe2181e19%40goo
>>> glegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/2753f88a-ba89-4164-8271-9eb13207736f%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUKXSiqsVuQenHf%2BCBJ01-XOeGGM8FKNn-G0xH%2B47QCTw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Creation of encoded unicharset failed While constructing LSTM training data.

2017-08-10 Thread ShreeDevi Kumar

Seems to work fine for me.

Are you sure that you have relevant files in the  directories listed in
that command?

check tessdata, langdata location.

Use tessdata/best/*.traineddata as the existing models.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Aug 10, 2017 at 2:05 PM,  wrote:

> Hello,
>
> I'm trying to finetune the end.traineddata model as the steps in the link:
> https://github.com/tesseract-ocr/tesseract/wiki/
> TrainingTesseract-4.00#fine-tuning-for-%C2%B1-a-few-characters
>
> As the tutorail shows, I fine tuning for ± a few characters following the
> steps.
>
> But when I execute the first command, to generate new training and eval
> data:
>
> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only 
> \
>   --noextract_font_properties --langdata_dir ../langdata \
>   --tessdata_dir ./tessdata --output_dir ~/tesstutorial/trainplusminus
>
>
> An error is prompted: *Creation of encoded unicharset failed! *While
> constructing LSTM training data.
>
> More details refer to the image.
>
> Can you help me? Thanks.
>
>
>
> 
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/1c40ba47-a6e5-4ec9-bf58-677bcdb2f74b%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWSqtqzPB0VP4nc%2B-en9wkYZ8dhEm-P8v%2BG_QFrzs59%3DA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Newbie: wondering why a fairly crisp document has such low accuracy

2017-08-12 Thread ShreeDevi Kumar

With English you should probably get close to 99% accuracy.

Is your png at 300 dpi?

Which version of tesseract did you use?
Which traineddata?

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sat, Aug 12, 2017 at 11:46 PM, Stephen Boesch  wrote:

> I printed out the "Welcome" page on my HP laserjet printer and scanned it
> in using .png .  The quality is quite good. So I had been  anticipating
> maybe 85%+ accuracy on the tesseract-OCR. I did not even bother to tally
> carefullly - but by eyeballing it seems about  50%.I had used all
> default settings.
>
> Some of the consistent errors:
>
> W -> H
> in -> m
> li -> h
> b -> t)
> ll -> H
>
> So is this just "the way things are" in OCR land?  Or am I missing some
> fundamental settings here - to get some reasonable usefulness?
>
> thanks
>
> stephenb
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/c7bc553d-6f89-4c52-a48a-2d2365b646c7%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVTY0XZ%2BFAD6xp%2BKOrE946J6EEJS0A9ihRPb%2BwVW%2BoGXQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Tesseract-ocr on Redhat 5

2017-07-07 Thread ShreeDevi Kumar

for 3.05 don't you need to checkout the 3.05 branch??
master is for 4.0 development.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Jul 7, 2017 at 9:22 PM, akhil katpally 
wrote:

> Steven .. Here is the list of commands to install tesseract 3.05 on Redhat
> 6 ... Hope this should work for Redhat 5 ... if not please try to downgrade
> the tesseract and try ..
>
> sudo yum update
>sudo yum install wget unzip
>sudo yum install gcc gcc-c++ make
>sudo yum install libjpeg-devel libpng-devel libtiff-devel
> zlib-devel
>sudo yum install libtool
>sudo yum install autoconf automake
>
>
>   sudo yum whatprovides libtool
>   (Install the latest version)
>   sudo yum whatprovides libtiff
>   sudo yum install libtiff-4.0.3-27.el7_3.x86_64
>
>Install autoconf-archive from: http://rpm.pbone.net/ind
> ex.php3/stat/4/idpl/23652016/dir/centos_6/com/autoconf-
> archive-2012.04.07-7.3.noarch.rpm.html
>   Download it manually and copy it into the ec2 instance.
>   sudo rpm -ivh autoconf-archive-2012.04.07-7.3.noarch.rpm
>
>
>
>   Installing leptonica:
>   wget http://www.leptonica.com/source/leptonica-1.74.1.tar.gz
>   tar xvf leptonica-1.74.1.tar.gz
>   cd leptonica-1.74.1
>   ./configure
>make
>   sudo make install
>   sudo ldconfig
>
>
>
>   Installing Tesseract:
>   cd ..
>   wget https://github.com/tesseract-o
> cr/tesseract/archive/master.zip
>   unzip master.zip
>cd tesseract-master/
>   sudo ./autogen.sh
>   export LIBLEPT_HEADERSDIR=/usr/local/include
>   export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
>   export LD_LIBRARY_PATH=/usr/local/lib
>   ./configure --with-extra-includes=/usr/local/include
> --with-extra-libraries=/usr/local/lib
>   make
>   sudo make install
>   sudo ldconfig
>
>   loading the training data for tesseract:
>   Dowload the tessdata and copy only the contents into the
> tesseract-master/tessdata
>   cd ..
>   sudo wget https://github.com/tesseract-o
> cr/tessdata/archive/master.zip
>   sudo unzip master.zip
>   Note: copy the contents into the tesseract-master/tessdata
>   export TESSDATA_PREFIX=/usr/local/share/
>   sudo mv ~/tesseract-master/tessdata/*  /usr/local/share/tessdata/
>
>   test: tesseract --version
>
>   for reference check: https://github.com/tess
> eract-ocr/tesseract/wiki/Compiling
>
> On Tuesday, June 27, 2017 at 1:09:48 PM UTC-7, Steven Heydendahl wrote:
>>
>> Is tesseract 3.05 available for redhat 5?  Can we just rpm it or do we
>> have to add a repository?
>>
>> On Tuesday, June 27, 2017 at 2:07:59 PM UTC-6, zdenop wrote:
>>>
>>> 2.04 is too old.
>>> Please ask install 3.05 + language data (at least eng and osd)
>>>
>>> Zdenko
>>>
>>> On Tue, Jun 27, 2017 at 9:58 PM, Steven Heydendahl 
>>> wrote:
>>>
 Hi all,

 Novice here.  I had made a request at my company to install
 tesseract-ocr on our redhat 5 OS.

 They ended up installing the following:
 rpm -Vp "tesseract-2.04-1.el5.rf.x86_64.rpm"

 which is apparently an older version of tesseract.  Now, that completed
 successfully however, every time I try to run tesseract I get an error
 message.  Even when I just try to do the following:
 tesseract --version

 the response is:
 tesseract:Error:Usage:tesseract imagename outputbase [-l lang]
 [configfile [[+|-]varfile]...]

 and if I try to run tesseract on an image:
 tesseract OCRTest.png text l- eng
 read_variables_file:Can't open 
 /usr/share/tesseract/tessdata/configs/engUnable
 to load unicharset file /usr/share/tesseract/tessdata/eng.unicharset


 I do not know if this was a botched install, if we are missing
 dependencies, or if tesseract is just not compatible with redhat 5.  Any
 help is greatly appreciated!

 Thanks,
 Steve

 --
 You received this message because you are subscribed to the Google
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit https://groups.google.com/d/ms
 gid/tesseract-ocr/7b21b154-f878-4d87-80f2-2458093fed7b%40goo
 glegroups.com
 
 .
 For more

[tesseract-ocr] Fwd: [tesseract-ocr/tesseract] Tag a new version for LSTM 4.0 (#995)

2017-07-11 Thread ShreeDevi Kumar

Forwarding update by Ray.

-- Forwarded message --
From: theraysmith 
Date: Wed, Jul 12, 2017 at 5:55 AM
Subject: Re: [tesseract-ocr/tesseract] Tag a new version for LSTM 4.0 (#995)
To: tesseract-ocr/tesseract 

I'm about ready to update the traineddatas. I have a training run almost
complete, and with accuracy that meets with my satisfaction.
There are a few regressions, but not too serious.
First though, I have to get some code reviewed in Google, and then make
some commits to github to match the new traineddatas.
Before that, there is the matter of a major pull...

Here's what's coming:

- Fix to issue 653: New components in traineddata file for the
unicharset, recoder and version string. Backwards compatible change, so the
LSTM component can still read older files.
- Change in training system. The above change makes open source training
impossible. Will add a new program to build a starter traineddata from a
unicharset and optional word lists.
- New "normalization" code to clean corpus text in all languages. That
was a big part of the work.
- Improvements to the trained networks to improve accuracy on single
characters and single words.
- 2 parallel sets of tessdata. "best" and "fast". "Fast" will exceed the
speed of legacy Tesseract in real time, provided you have the required
parallelism components, and in total CPU only slightly slower for English.
Way faster for most non-latin languages, while being <5% worse than "best"
Only "best" will be retrainable, as "fast" will be integer.

I have other stuff that is still incomplete, but that is a good list for
now.

BTW, in case you hadn't noticed, there was a breaking change that made old
lstmf files unusable. That was needed to fix LSTM for OSD. It has to know
the language of each training sample.
The new traineddatas will mostly be smaller than the older ones, as they
won't contain the legacy components, and no bigram dawgs are needed.

-- 
Ray.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub

,

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWPWhxWpMC-Csx-o3Nd7hvh%3DteJbvfPC2JkL9excAp2CA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] While extracting numbers tesseract makes a lot of errors

2017-07-09 Thread ShreeDevi Kumar

If using 3.05 branch

try configs such as

digits
whitelist

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sun, Jul 9, 2017 at 7:36 PM, Prav  wrote:

> Any suggestions for any configuration which i can use to extract numbers
> from scan documents correctly Tesseract makes errors such as O for 0 and $
> for 4 etc.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/9a244db8-cbf1-432a-b5dc-d15d8d8bf5c0%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW0dusojNc6OXhmisTUcCJLS_7vKnwcw-Q8wJwT9QbOKA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

< 1 2 3 4 5 6 7 8 >

201 - 300 of 761 matches

Mail list logo