How can I recognize page number printed on a page? Page numbers can be normal
(23) or Roman (XXIII) numerals, can be located in any corner or center of the
page top/bottom, can have different placement for even/odd pages, and can have
some decoration or chapter name near them. I need to do this
Peter,
Please see
https://github.com/tesseract-ocr/langdata/blob/master/swe/swe.training_text
You can provide additional training text if some needed characters are
missing in the above. I can do a test training with it.
- excuse the brevity, sent from mobile
On 06-Jan-2017 5:01 PM, "Peter"
I have uploaded modified nor.traineddata at
https://github.com/Shreeshrii/tessdata4alpha/blob/master/nor.traineddata
See attached log and info file for commands used in training. It took about
9 hours on my pc - about 1700 iterations only and then my PC froze so I
rebooted and created the
Den torsdag 5 januari 2017 kl. 04:39:01 UTC+1 skrev shree:
>
> Ray is planning to retrain the languages for the new 4.0.0 version
> sometime in January. So it would be helpful if you could open an issue on
> https://github.com/tesseract-ocr/langdata/issues with this information.
>
Is it
Does anyone know of any utilities to convert a box file to ground truth
text file?
I am using tesstrain.sh which uses text2image for trying out LSTM training.
However, because unrenderable words are not included in the tifs, it is not
possible to use the training_text as ground truth.
Thanks!
5 matches
Mail list logo