[tesseract-ocr] Re: In Spanish language, character ‘o’ is recognized incorrectly as some round symbol

2017-09-26 Thread Quan Nguyen
The Wiki page offers more info:

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#updated-data-files-for-version-400-september-15-2017

On Sunday, September 24, 2017 at 9:56:29 AM UTC-5, Quan Nguyen wrote:
>
> It depends on your needs. There are also fast traineddata:
>
> https://github.com/tesseract-ocr/tessdata_fast
>
> It looks that many languages are represented.
>
> On Saturday, September 23, 2017 at 12:38:46 PM UTC-5, Subrato Namata wrote:
>>
>> Thanks Quan Nguyen. My initial results show that the issue is gone. Let 
>> me try with few more samples.
>> Additionally, are these the best trained data of tesseract available for 
>> all the other languages and we must be using these only ?
>>
>>
>>
>> On Saturday, 23 September 2017 00:02:51 UTC+5:30, Quan Nguyen wrote:
>>>
>>> Try best traineddata:
>>>
>>> https://github.com/tesseract-ocr/tessdata_best
>>>
>>> On Friday, September 22, 2017 at 2:24:08 AM UTC-5, Subrato Namata wrote:

 Environment

 Windows Setup: tesseract-ocr-setup-4.0.0-alpha.20170804.exe
 Spanish Trained Data: 
 https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata
 Command Used to OCR:
 tesseract.exe ImageDoc.png output --oem 1 -l spa
 Where ImageDoc.png is a Spanish Scanned Document
 output is the text file output of OCRed text

- Tesseract Version: 4.0
- Platform: Windows version 64 Bit

 Current Behavior:

 In Spanish, character ‘o’ is recognized incorrectly as some round 
 symbol. Attached input file is ImageDoc.png and Error screenshot

 [image: spanish] 
 
 [image: imagedoc] 
 




 Expected Behavior:

 Character ‘o’ should be recognized correctly.

>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/43f20f10-35c3-49dd-9319-22267d0d857d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: In Spanish language, character ‘o’ is recognized incorrectly as some round symbol

2017-09-24 Thread Quan Nguyen
It depends on your needs. There are also fast traineddata:

https://github.com/tesseract-ocr/tessdata_fast

It looks that many languages are represented.

On Saturday, September 23, 2017 at 12:38:46 PM UTC-5, Subrato Namata wrote:
>
> Thanks Quan Nguyen. My initial results show that the issue is gone. Let me 
> try with few more samples.
> Additionally, are these the best trained data of tesseract available for 
> all the other languages and we must be using these only ?
>
>
>
> On Saturday, 23 September 2017 00:02:51 UTC+5:30, Quan Nguyen wrote:
>>
>> Try best traineddata:
>>
>> https://github.com/tesseract-ocr/tessdata_best
>>
>> On Friday, September 22, 2017 at 2:24:08 AM UTC-5, Subrato Namata wrote:
>>>
>>> Environment
>>>
>>> Windows Setup: tesseract-ocr-setup-4.0.0-alpha.20170804.exe
>>> Spanish Trained Data: 
>>> https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata
>>> Command Used to OCR:
>>> tesseract.exe ImageDoc.png output --oem 1 -l spa
>>> Where ImageDoc.png is a Spanish Scanned Document
>>> output is the text file output of OCRed text
>>>
>>>- Tesseract Version: 4.0
>>>- Platform: Windows version 64 Bit
>>>
>>> Current Behavior:
>>>
>>> In Spanish, character ‘o’ is recognized incorrectly as some round 
>>> symbol. Attached input file is ImageDoc.png and Error screenshot
>>>
>>> [image: spanish] 
>>> 
>>> [image: imagedoc] 
>>> 
>>>
>>>
>>>
>>>
>>> Expected Behavior:
>>>
>>> Character ‘o’ should be recognized correctly.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e662287d-7e0e-4e2a-b776-8c75057b5bdc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: In Spanish language, character ‘o’ is recognized incorrectly as some round symbol

2017-09-23 Thread Subrato Namata
Thanks Quan Nguyen. My initial results show that the issue is gone. Let me 
try with few more samples.
Additionally, are these the best trained data of tesseract available for 
all the other languages and we must be using these only ?



On Saturday, 23 September 2017 00:02:51 UTC+5:30, Quan Nguyen wrote:
>
> Try best traineddata:
>
> https://github.com/tesseract-ocr/tessdata_best
>
> On Friday, September 22, 2017 at 2:24:08 AM UTC-5, Subrato Namata wrote:
>>
>> Environment
>>
>> Windows Setup: tesseract-ocr-setup-4.0.0-alpha.20170804.exe
>> Spanish Trained Data: 
>> https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata
>> Command Used to OCR:
>> tesseract.exe ImageDoc.png output --oem 1 -l spa
>> Where ImageDoc.png is a Spanish Scanned Document
>> output is the text file output of OCRed text
>>
>>- Tesseract Version: 4.0
>>- Platform: Windows version 64 Bit
>>
>> Current Behavior:
>>
>> In Spanish, character ‘o’ is recognized incorrectly as some round symbol. 
>> Attached input file is ImageDoc.png and Error screenshot
>>
>> [image: spanish] 
>> 
>> [image: imagedoc] 
>> 
>>
>>
>>
>>
>> Expected Behavior:
>>
>> Character ‘o’ should be recognized correctly.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e64de3b7-7a04-49a4-ae6c-d4f3e33cf65f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: In Spanish language, character ‘o’ is recognized incorrectly as some round symbol

2017-09-22 Thread Quan Nguyen
Try best traineddata:

https://github.com/tesseract-ocr/tessdata_best

On Friday, September 22, 2017 at 2:24:08 AM UTC-5, Subrato Namata wrote:
>
> Environment
>
> Windows Setup: tesseract-ocr-setup-4.0.0-alpha.20170804.exe
> Spanish Trained Data: 
> https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata
> Command Used to OCR:
> tesseract.exe ImageDoc.png output --oem 1 -l spa
> Where ImageDoc.png is a Spanish Scanned Document
> output is the text file output of OCRed text
>
>- Tesseract Version: 4.0
>- Platform: Windows version 64 Bit
>
> Current Behavior:
>
> In Spanish, character ‘o’ is recognized incorrectly as some round symbol. 
> Attached input file is ImageDoc.png and Error screenshot
>
> [image: spanish] 
> 
> [image: imagedoc] 
> 
>
>
>
>
> Expected Behavior:
>
> Character ‘o’ should be recognized correctly.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0c091ffa-923c-4f48-b273-6d93751c8b82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.