Hello everyone! I am trying to use tesseract-ocr (pytesseract) to detect some specific codes and I receive as input a single word at a time. Those codes have always the same length (8) and I want to receive as output only sequences with 8 characters.
I have tried all the solution described in the manual https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#CONFIGFILE without success. More in details I tried to : - Create a *CONFIGFILE*, referring to a user pattern file - Pass directly the file with the *--user-patterns* option I also tried few different regular expression (I read that tesseract supports only a subset). The ideal regex will be something like that *^.{8}$ *because I want only to select the length, not a specific set of character (all unicode char) I also tried some very general regex that I read are supported, such as *\d *that should return only sequences made of digits but it seems to be ignored. I am missing something or it is not possible to force a sequence output length? Thank you in advance -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7bac3a20-32c2-434f-91c3-60c39213db72%40googlegroups.com.

