[tesseract-ocr] Tesseract Font Trainer

2017-09-22 Thread Zhenia Krivopaltsev
Folks,

I found training portion of tesseract quite challenging. In order to 
simplify it I have created an application to get bounding boxes for an 
arbitrary text and fonts.


In essence it is IOS FontTrainer application that  runs in XCode with IPad 
simulator.


Font trainer allows font selection, new fonts could be downloaded and added 
as well. Setting screen allows to selected desired fonts and font sizes, 
drag and drop a training text and initiate a measurement flow. The tool 
generates bunch of artifacts - with extensions - txt, tif, box, 
font_properties.

I use the tool to create a training set and found it useful. I wonder would 
it be useful for others ? In case if it is something other folks want to 
explore I will publish it.

Thanks



-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3e87d992-fcbe-416d-848b-71ff595b9ac4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: In Spanish language, character ‘o’ is recognized incorrectly as some round symbol

2017-09-22 Thread Quan Nguyen
Try best traineddata:

https://github.com/tesseract-ocr/tessdata_best

On Friday, September 22, 2017 at 2:24:08 AM UTC-5, Subrato Namata wrote:
>
> Environment
>
> Windows Setup: tesseract-ocr-setup-4.0.0-alpha.20170804.exe
> Spanish Trained Data: 
> https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata
> Command Used to OCR:
> tesseract.exe ImageDoc.png output --oem 1 -l spa
> Where ImageDoc.png is a Spanish Scanned Document
> output is the text file output of OCRed text
>
>- Tesseract Version: 4.0
>- Platform: Windows version 64 Bit
>
> Current Behavior:
>
> In Spanish, character ‘o’ is recognized incorrectly as some round symbol. 
> Attached input file is ImageDoc.png and Error screenshot
>
> [image: spanish] 
> 
> [image: imagedoc] 
> 
>
>
>
>
> Expected Behavior:
>
> Character ‘o’ should be recognized correctly.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0c091ffa-923c-4f48-b273-6d93751c8b82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] In Spanish language, character ‘o’ is recognized incorrectly as some round symbol

2017-09-22 Thread Subrato Namata
Environment

Windows Setup: tesseract-ocr-setup-4.0.0-alpha.20170804.exe
Spanish Trained Data: 
https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata
Command Used to OCR:
tesseract.exe ImageDoc.png output --oem 1 -l spa
Where ImageDoc.png is a Spanish Scanned Document
output is the text file output of OCRed text

   - Tesseract Version: 4.0
   - Platform: Windows version 64 Bit

Current Behavior:

In Spanish, character ‘o’ is recognized incorrectly as some round symbol. 
Attached input file is ImageDoc.png and Error screenshot

[image: spanish] 

[image: imagedoc] 





Expected Behavior:

Character ‘o’ should be recognized correctly.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/62f497d2-3faa-41fb-a7a4-9054d64697a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.