[tesseract-ocr] How to limit length of string output in image to string?

2018-12-17 Thread Devendra Damle
Hi. I am using pytesseract for solving captcha codes. The captcha always has 6 characters. Is there any way to set the number of characters to look for in the image to string function? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsub

[tesseract-ocr] Re: tesseract performs wrong auto-correction sometimes : how to disable it?

2018-12-17 Thread 'ilochray' via tesseract-ocr
I am experiencing the same issue. Did you ever find a resolution for this? On Wednesday, 25 April 2018 10:59:34 UTC-4, Youcef wrote: > > Hi, > > > Tesseract seems to post process its prediction. > > Here after, what I get after OCRizing images (same font, same size images > generated with text2i

[tesseract-ocr] Generating LSTMF files using tesseract with psm =6

2018-12-17 Thread Raniem
Dear All Thanks for all your efforts answering people queries when possible. This might be a pre-asked quesiton but I failed to find the references I am confused with the nature of the .lstmf files generated during training. Let us say I am fine tuning the English model, and the old model is us

[tesseract-ocr] Re: how to prepare training text

2018-12-17 Thread Raniem
if you are planning to use the training data for original models you can download them from here: https://github.com/tesseract-ocr/langdata_lstm For your own training data you should follow the training tutorial here , or u

[tesseract-ocr] Recognition of chemical formulas

2018-12-17 Thread Vadim Fedorov
Hello everyone, I need an advice. Would it make sense to train a separate model (datafile) exclusively for recognition of chemical formulas? With the default model for English the following formula [image: test5.png] is recognized as "CONH(CH*5*)3N(C*o*H*s*)*o*" by LSTM engine. So there are mi

Re: [tesseract-ocr] Recognition of chemical formulas

2018-12-17 Thread Shree Devi Kumar
Please take a look at related issue regarding subscripts/superscripts (in langdata or tessdata repos). As far as I understand, the currently used normalization routines convert them to regular numbers. Hence, training did not seem to help in my fine tuning trial. However, you can give it a try a

[tesseract-ocr] unable to search text from heading with bold

2018-12-17 Thread Abhay Soni
Hi, I have recently installed tesseract 4.0 on centos 7 . Successful installation done , no error. when i convert one jpeg into searchable pdf file using #tesseract 2.jpeg -l eng 2 pdf it convert the jpeg into pdf successfully . But when i search text from pdf file , it would show result of o