Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-22 Thread Des Bw
I have updated the guide explaining on how to train by cutting the top layer. You can check it out. I hope it is helpful. On Sunday, October 22, 2023 at 7:41:15 PM UTC+3 renec...@gmail.com wrote: > Hi Keith, > The foo.traindedata is not existing but do you mean : the trainedata I > want to

[tesseract-ocr] Re: Error: traineddata file must contain at least (a unicharset fileand inttemp) OR an lstm file.

2023-10-22 Thread Des Bw
are you trying to train from scratch? On Sunday, October 22, 2023 at 8:27:26 PM UTC+3 bkpalm...@gmail.com wrote: > > I have both of these files. I don't understand. They are both prefixed > with .eng in my tessdata directory. I am so close... > -- You received this message because you

[tesseract-ocr] Error: traineddata file must contain at least (a unicharset fileand inttemp) OR an lstm file.

2023-10-22 Thread Hammurabi
I have both of these files. I don't understand. They are both prefixed with .eng in my tessdata directory. I am so close... -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from

Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-22 Thread René JM Clais
Hi Keith, The foo.traindedata is not existing but do you mean : the trainedata I want to train ex: hye.traineddata ? In my case I should add a new character in the hye.traineddata It seems that I can do this using the option 2 ! But how ? Which command should I use to execute this function

[tesseract-ocr] Re: How to create training data in teseract5.3.0 use tesstrain.sh way?

2023-10-22 Thread Des Bw
The shell script still works. But, if you are specifically looking for a python script, there are a number of python scripts posted in this forum. I personally have been using the script posted by Ali here: https://groups.google.com/g/tesseract-ocr/c/-G7TZEnVHgE On Sunday, October 22, 2023 at

[tesseract-ocr] How to create training data in teseract5.3.0 use tesstrain.sh way?

2023-10-22 Thread 易鑫
Hello, everyone: As we know.in tesseract 5.0 , we can use tesstrain.sh to create training data,but in tesseract5.3.0, the tesstrain.sh script is removed. The guide says:" * bash scripts is unsupported/abandoned for Tesseract 5. Please use python scripts from tesstrain repo

Re: [tesseract-ocr] accuracy problem after trained in fine-tune

2023-10-22 Thread Ali hussain
thx. i will try with this method as soon as possible. On Sunday, 22 October, 2023 at 3:49:46 pm UTC+6 desal...@gmail.com wrote: > here it is: > https://github.com/tesseract-ocr/tessdoc/blob/main/Data-Files-in-tessdata_best.md > > On Sunday, October 22, 2023 at 12:45:40 PM UTC+3 Des Bw wrote: >

Re: [tesseract-ocr] accuracy problem after trained in fine-tune

2023-10-22 Thread Des Bw
here it is: https://github.com/tesseract-ocr/tessdoc/blob/main/Data-Files-in-tessdata_best.md On Sunday, October 22, 2023 at 12:45:40 PM UTC+3 Des Bw wrote: > This is the code I used to train from a layer: > *make training MODEL_NAME=amh START_MODEL=amh APPEND_INDEX=5 > NET_SPEC='[Lfx256

Re: [tesseract-ocr] accuracy problem after trained in fine-tune

2023-10-22 Thread Des Bw
This is the code I used to train from a layer: *make training MODEL_NAME=amh START_MODEL=amh APPEND_INDEX=5 NET_SPEC='[Lfx256 O1c105]' TESSDATA=../tesseract/tessdata EPOCHS=3 TARGET_ERROR_RATE=0.0001 training >> data/amh.log &* *- I took it from Scheer' training *tesstrain-JSTORArabic*: *

Re: [tesseract-ocr] accuracy problem after trained in fine-tune

2023-10-22 Thread Ali hussain
you can test by changes '--char spacing=1.0 . i think it would be problem accuracy of result on it also. On Sunday, 22 October, 2023 at 3:07:16 pm UTC+6 Ali hussain wrote: > i haven't tried by cut the top layer of the network. you can share your > knowledge what you done by cut the top layer of

Re: [tesseract-ocr] accuracy problem after trained in fine-tune

2023-10-22 Thread Ali hussain
i haven't tried by cut the top layer of the network. you can share your knowledge what you done by cut the top layer of the network. or github project link. On Sunday, 22 October, 2023 at 12:27:32 pm UTC+6 desal...@gmail.com wrote: > That is massive data. Have you tried to train by cut the top

Re: [tesseract-ocr] accuracy problem after trained in fine-tune

2023-10-22 Thread Des Bw
That is massive data. Have you tried to train by cut the top layer of the network? I think that is the most promising approach. I was getting really good results with that. But, the result is not getting translated to scanned documents. I get best results with the syntethic data. I am no