Re: [tesseract-ocr] Fine tuning existing model

2018-07-02 Thread Lorenzo Bolzani
Hi Shree, I replaced the line: merge_unicharsets $(TESSDATA)/$(CONTINUE_FROM).lstm-unicharset $(TRAIN)/my.unicharset "$@" with: cp "$(TRAIN)/my.unicharset" "data/unicharset" (I write this in case someone else is following this thread). And now I have a fine tuned brand new model with only

Re: [tesseract-ocr] recognising roman with sanskrit diacritics

2018-07-02 Thread yajva
Many thanks. Downloaded and using. Will wait for next ver. On Sunday, July 1, 2018 at 12:21:19 AM UTC+5:30, shree wrote: > > I have uploaded a new version of traineddata file at > > https://github.com/Shreeshrii/tessdata_shreetest/blob/master/iast-layer-18003.traineddata > > Attached is the

Re: [tesseract-ocr] Encoding of string failed when finetune fot adding new fonts is fas language

2018-07-02 Thread ran go
in my opinion error is for font-type, for some font there is no error but for some other fonts there is error On Mon, Jul 2, 2018 at 9:15 AM, john wrote: > I use tesseract 4.0.0-beta.1. downloaded from this link (UB mannheim) >

Re: [tesseract-ocr] Train 2 language together

2018-07-02 Thread Zohreh Khosrobeygi
Thx. you're right. On Sunday, July 1, 2018 at 10:02:55 PM UTC+4:30, shree wrote: > > The font being used does not support English. > > On Sun, Jul 1, 2018 at 10:06 PM Zohreh Khosrobeygi > wrote: > >> Hi, >> I have been training the text: >> >> 272-135031- BECAUSE YOU WERE SLEEPING INSTEAD

[tesseract-ocr] Re: Tesseract v3.05.02 Training Error During Processing

2018-07-02 Thread James Lipham
I have also updated the image to have everything as the same font/size/etc, but still, tesseract just says "Error during processing." with seemingly zero information as to why. Has anyone ever experienced this? If I can't find anything else out, I guess I'll just have to step through the page

Re: [tesseract-ocr] Encoding of string failed when finetune fot adding new fonts is fas language

2018-07-02 Thread Shree Devi Kumar
also see https://github.com/tesseract-ocr/tesseract/issues/549 On Mon, Jul 2, 2018 at 7:45 PM Shree Devi Kumar wrote: > You can use find_fonts with your training_text to locate the fonts to use. > > Modify the following command to match your directory setup and try > > echo "## FIND FONTS

[tesseract-ocr] Re: Where can i get Other language Cube language files.

2018-07-02 Thread cohengil333
Great question, I'm stuck too with this just with Hebrew OCR. Any suggestion? On Tuesday, March 13, 2018 at 7:13:50 PM UTC+2, Harshit Dohare wrote: > > Hi, > > As far as I have looked into Tesseract, cube files are only available for > Hindi and Arabic language. > Check here -

[tesseract-ocr] A friendly suggestion for the "tesseract-ocr" group members (Concern to all members)

2018-07-02 Thread cohengil333
It seems with all languages and revisions, people (including me) tend to search a lot for answers here in the group. So I have a suggestion, Can the group administrator pin a message with a spreadsheet, which consists the state of each revision with the corresponding language this way it

[tesseract-ocr] How to generate multiple teesedit_write_images output

2018-07-02 Thread Junye Li
Hi there, I want to see the actual input images processed by tesseract usingthe command -c tesseract and I used tessedit_write_images=TRUE. However, when I pass multi-layer (mutiple pages) .tiff image to tesseract the output tessinput.tif image only contains one layer, which is the last

[tesseract-ocr] Re: Tesseract v3.05.02 Training Error During Processing

2018-07-02 Thread Quan Nguyen
Wrong filename format. The box should be named `eng.dmd.exp0.box`. On Monday, July 2, 2018 at 7:40:26 AM UTC-5, James Lipham wrote: > > I have also updated the image to have everything as the same > font/size/etc, but still, tesseract just says "Error during processing." > with seemingly zero

Re: [tesseract-ocr] Encoding of string failed when finetune fot adding new fonts is fas language

2018-07-02 Thread Shree Devi Kumar
You can use find_fonts with your training_text to locate the fonts to use. Modify the following command to match your directory setup and try echo "## FIND FONTS ##" # Find fonts which can render your training_text. Run `fc-cache -vf` to refresh cache. # You can change the minimum