[tesseract-ocr] Not able to force a specific sequence length

Fernando Fri, 22 Nov 2019 01:09:10 -0800

Hello everyone!
I am trying to use tesseract-ocr (pytesseract) to detect some specific 
codes and I receive as input a single word at a time.
Those codes have always the same length (8) and I want to receive as output 
only sequences with 8 characters.


I have tried all the solution described in the manual 
https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#CONFIGFILE
 
without success.

More in details I tried to :


   - Create a *CONFIGFILE*, referring to a user pattern file
   - Pass directly the file with the *--user-patterns* option

I also tried few different regular expression (I read that tesseract 
supports only a subset).
The ideal regex will be something like that *^.{8}$ *because I want only to 
select the length, not a specific set of character (all unicode char)

I also tried some very general regex that I read are supported, such as *\d 
*that should return only sequences made of digits but it seems to be 
ignored.

I am missing something or it is not possible to force a sequence output 
length?

Thank you in advance

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7bac3a20-32c2-434f-91c3-60c39213db72%40googlegroups.com.

[tesseract-ocr] Not able to force a specific sequence length

Reply via email to