Re: [tesseract-ocr] Trained font - always one letter wrong

2018-05-02 Thread dave . hardy
Thanks for your effort! 
I tried language deu before and as you can see in your attached txts, there are 
some faults too.
I could not eliminate them using freq- or user-words. 
But in general your result with deu is much better
Than mine with v 3.05.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ea06f6ad-5931-4909-88db-57525cb50ce3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Trained font - always one letter wrong

2018-05-02 Thread ShreeDevi Kumar
Your image has text in German. You will get better results using language
`deu` out of the box.

Attached are OCR results using deu.traineddata from tessdata_best and
tessdata_fast using tesseract-4.0.0-beta.1 run via command line.

#tesseract sample.tif sample-deu-fast -l deu --tessdata-dir ./tessdata_fast
--psm 6 -c preserve_interword_spaces=1
Tesseract Open Source OCR Engine v4.0.0-beta.1-207-g984a with Leptonica
Page 1

# tesseract sample.tif sample-deu-best -l deu --tessdata-dir
./tessdata_best --psm 6 -c preserve_interword_spaces=1
Tesseract Open Source OCR Engine v4.0.0-beta.1-207-g984a with Leptonica
Page 1



ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, May 2, 2018 at 10:20 PM,  wrote:

> I attached a sample TIF
>
> hope this will work.
>
>
> Am Mittwoch, 2. Mai 2018 08:43:15 UTC+2 schrieb shree:
>>
>> Please provide a small sample image to test.
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Wed, May 2, 2018 at 11:26 AM,  wrote:
>>
>>> Training doesn't work. If i use the characters "ä, ö, ü" (which i need)
>>> in my training text, text2image says: WARNING:
>>> illegal UTF8 encountered and then creates an incorrect box/tif pair.
>>> This seems not to depend on my font, because with Arial it does the same
>>> thing.
>>> Can you help me to avoid this?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/d5cc618f-0122-4857-a677-4a92f4b13ba1%40goo
>>> glegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/853efaa3-46fa-4f09-a799-4bf5f2d402ae%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVLo1X3%3D_ZL-_AH01mWqVWNJQ_ERTwwgHzqg8ZFgR%2BawQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
29.10.2017-07:49 +49 3571   LST_IRLS
 s. 111
Einsatzdepesche:  Ausdruck am: 29.10.2017? um: 07:49
Einsatzdaten:

Gemeinde:

Ortsteil:

Straße : Haus-Nr. :

Stichwort   : H1 THL klein Auswahl:

Sondersignal: Ja

Label   : Unwetter   28.10.17

Ob jekt   !

Einsatzplan :

Melder   !

Hydrantenbuch:

Was  : URU olme Personenschaden

Hinweise:

Feuerwehrplan:

Gebäudefunk:Notschlüsselrohr: PU Anlage:
Fahrzeuge - alarmiert: (Wache/Funkkemer/Typ/Fahrtnunner )
Fahrzeuge - bereits im Einsatz: (Wache/Funkkemner/Typ/Fahrtnumner )
29.10,.2017-07;49 +49 3571   LST_IRLS   
  S. 1/1
Einsatzdepesche:  Ausdruck am: 29.10.2017 um: 07:49
Einsatzdaten:

Gemeinde :

Ortsteil:

Straße : Haus-Nr. :

Stichwort   : H1 THL klein Auswahl:

Sondersignal: Ja

Label   : Unwetter   28.10.17

Objekt   :

Einsatzplan :

Melder   :

Hydrantenbuch:

Was  : UKU olme Personenschaden

Hinweise:

Feuerwehrplan:

Gebäudefunk:Notschlüsselrohr: PU Anlage:
Fahrzeuge - alarmiert: (Wache-/Funkkemer-/Typ-Fahrtnummer)
Fahrzeuge - bereits im Einsatz: (Wache-/Funkkemer-/Typ-Fahrtnummer)


Re: [tesseract-ocr] Trained font - always one letter wrong

2018-05-02 Thread ShreeDevi Kumar
Please provide a small sample image to test.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, May 2, 2018 at 11:26 AM,  wrote:

> Training doesn't work. If i use the characters "ä, ö, ü" (which i need) in
> my training text, text2image says: WARNING:
> illegal UTF8 encountered and then creates an incorrect box/tif pair.
> This seems not to depend on my font, because with Arial it does the same
> thing.
> Can you help me to avoid this?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/d5cc618f-0122-4857-a677-4a92f4b13ba1%
> 40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVLfbLZ7OdHOD7xewEPZqZmQDj-1ydw6fLyfrVbkyW1sw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Trained font - always one letter wrong

2018-05-01 Thread dave . hardy
Training doesn't work. If i use the characters "ä, ö, ü" (which i need) in my 
training text, text2image says: WARNING:
illegal UTF8 encountered and then creates an incorrect box/tif pair.
This seems not to depend on my font, because with Arial it does the same thing. 
Can you help me to avoid this? 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d5cc618f-0122-4857-a677-4a92f4b13ba1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Trained font - always one letter wrong

2018-04-30 Thread ShreeDevi Kumar
Use the latest version

4.0.0beta


On Sun 29 Apr, 2018, 1:51 PM ,  wrote:

> I did. Unfortunately they don't aswer...
> Have you any advice for me, to improve the
> training proccess? How many training texts should i use? Or is it possible
> that there is a problem with this font at all? Would help very much to find
> that out.
>
> Best regards Dave
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/b050af7c-d3bf-468f-aedc-a93c905b8855%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX96fJdi5titHq9JP%2BELyG8L_Hvvy0C3ssUkaNFFc8wyw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Trained font - always one letter wrong

2018-04-29 Thread ShreeDevi Kumar
Check that your training text has enough samples for d.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sun, Apr 29, 2018 at 1:51 PM,  wrote:

> I did. Unfortunately they don't aswer...
> Have you any advice for me, to improve the
> training proccess? How many training texts should i use? Or is it possible
> that there is a problem with this font at all? Would help very much to find
> that out.
>
> Best regards Dave
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/b050af7c-d3bf-468f-aedc-a93c905b8855%
> 40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWkDCeSCDhGqP5rMSxhP%3D0SdGCuK5NmYWCE4FkXcpOjbw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Trained font - always one letter wrong

2018-04-29 Thread dave . hardy
I did. Unfortunately they don't aswer...
Have you any advice for me, to improve the 
training proccess? How many training texts should i use? Or is it possible that 
there is a problem with this font at all? Would help very much to find that 
out. 

Best regards Dave

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b050af7c-d3bf-468f-aedc-a93c905b8855%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Trained font - always one letter wrong

2018-04-25 Thread Zdenko Podobny
Well, you should contact creator of traineddata . We have no clue what they
did..

Zdenko

2018-04-25 14:55 GMT+02:00 :

> Hello there,
>
> i don't know what to do anymore...
> I want to use tesseract-ocr 3.05 for scanning documents, using the font
> "Perfect DOS VGA 437 Win".
> Got a traineddata file for my font from trainyourtesseract.com, actual it
> works really nice but in every case the letter "d" isnt identified but "a"
> or "u" is given out instead  eg  "Gemeinue" instead of "Gemeinde".
>
> Adding my words to freq-dawg didnt change anything.
> I tried also to train tesseract with a new language, using this font, but
> the result is even worse.
> Combine languages perfect+deu effects some right words with the "d" and
> many wrong.
>
> Is there anyone who can help me please?
>
> I'm completely desperate. :-(
>
> Sorry for bad english and best regards
>
> Dave
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/a50cdf35-63ef-4dc8-943b-a8d69c5adc6a%
> 40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xpwkCx3azuc9d4bQS-SWMU%2BETfHjgujs9uUuwBYd9kZw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Trained font - always one letter wrong

2018-04-25 Thread dave . hardy
Hello there, 

i don't know what to do anymore...
I want to use tesseract-ocr 3.05 for scanning documents, using the font 
"Perfect DOS VGA 437 Win".
Got a traineddata file for my font from trainyourtesseract.com, actual it works 
really nice but in every case the letter "d" isnt identified but "a" or "u" is 
given out instead  eg  "Gemeinue" instead of "Gemeinde".

Adding my words to freq-dawg didnt change anything.
I tried also to train tesseract with a new language, using this font, but the 
result is even worse.
Combine languages perfect+deu effects some right words with the "d" and many 
wrong. 

Is there anyone who can help me please? 

I'm completely desperate. :-(

Sorry for bad english and best regards

Dave

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a50cdf35-63ef-4dc8-943b-a8d69c5adc6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.