** Description changed: ******* this report is a summary of known problems The text recognition feature (OCR - Region.text()) together with the possibility to find text in an image is still experimental and under developement. This are currently reported bugs: bug 777660: text recognition errors with some fonts + bug 783082: [request] want font parameters for text recognition bug 735434: Text extraction from Images fails in some cases on colored backgrounds bug 695616: Inconsistency in text recognition and matching, especially with integers-as-text! bug 695650: find(text).text() does not return same text bug 701005: text() always returns text with trailing x'200A20' bug 701012: text() does not return all intervening blanks, add's others Other experienced oddities -- there are problems with text, that is not in english language -- very small and very large fonts may not work -- multiline text makes problems -- intervening/preceding/trailing grafics and symbols are tried to be interpreted as text Tip when using Region.text(): Currently you get the best results, when the region represents only one line of text and only contains text (no graphics/symbols) in english language. If you can influence it: make the text as large as possible. -- additional information: Internally the tesseract OCR engine (http://code.google.com/p/tesseract-ocr/) is used. So their restrictions apply (e.g. minimum size of font, ...). Information can be found on their Wiki.
** Description changed: - ******* this report is a summary of known problems + ******* this report is a summary of known problems and feature requests The text recognition feature (OCR - Region.text()) together with the possibility to find text in an image is still experimental and under developement. This are currently reported bugs: bug 777660: text recognition errors with some fonts bug 783082: [request] want font parameters for text recognition bug 735434: Text extraction from Images fails in some cases on colored backgrounds bug 695616: Inconsistency in text recognition and matching, especially with integers-as-text! bug 695650: find(text).text() does not return same text bug 701005: text() always returns text with trailing x'200A20' bug 701012: text() does not return all intervening blanks, add's others Other experienced oddities -- there are problems with text, that is not in english language -- very small and very large fonts may not work -- multiline text makes problems -- intervening/preceding/trailing grafics and symbols are tried to be interpreted as text Tip when using Region.text(): Currently you get the best results, when the region represents only one line of text and only contains text (no graphics/symbols) in english language. If you can influence it: make the text as large as possible. -- additional information: Internally the tesseract OCR engine (http://code.google.com/p/tesseract-ocr/) is used. So their restrictions apply (e.g. minimum size of font, ...). Information can be found on their Wiki. -- You received this bug notification because you are a member of Sikuli Drivers, which is subscribed to Sikuli. https://bugs.launchpad.net/bugs/710586 Title: X 1.0rc2: Region.text() -- known problems and needed improvements Status in Sikuli: In Progress Bug description: ******* this report is a summary of known problems and feature requests The text recognition feature (OCR - Region.text()) together with the possibility to find text in an image is still experimental and under developement. This are currently reported bugs: bug 777660: text recognition errors with some fonts bug 783082: [request] want font parameters for text recognition bug 735434: Text extraction from Images fails in some cases on colored backgrounds bug 695616: Inconsistency in text recognition and matching, especially with integers-as-text! bug 695650: find(text).text() does not return same text bug 701005: text() always returns text with trailing x'200A20' bug 701012: text() does not return all intervening blanks, add's others Other experienced oddities -- there are problems with text, that is not in english language -- very small and very large fonts may not work -- multiline text makes problems -- intervening/preceding/trailing grafics and symbols are tried to be interpreted as text Tip when using Region.text(): Currently you get the best results, when the region represents only one line of text and only contains text (no graphics/symbols) in english language. If you can influence it: make the text as large as possible. -- additional information: Internally the tesseract OCR engine (http://code.google.com/p/tesseract-ocr/) is used. So their restrictions apply (e.g. minimum size of font, ...). Information can be found on their Wiki. _______________________________________________ Mailing list: https://launchpad.net/~sikuli-driver Post to : [email protected] Unsubscribe : https://launchpad.net/~sikuli-driver More help : https://help.launchpad.net/ListHelp

