On 9/10/19 2:07 PM, Thomas Huth wrote: > On 10/09/2019 14.02, Peter Maydell wrote: >> On Sat, 7 Sep 2019 at 16:47, Thomas Huth <h...@tuxfamily.org> wrote: >>> >>> From: Philippe Mathieu-Daudé <f4...@amsat.org> >>> >>> Add a test of the NeXTcube framebuffer using the Tesseract OCR >>> engine on a screenshot of the framebuffer device. >>> >>> The test is very quick: >>> >>> $ avocado --show=app,console run tests/acceptance/machine_m68k_nextcube.py >>> JOB ID : 78844a92424cc495bd068c3874d542d1e20f24bc >>> JOB LOG : >>> /home/phil/avocado/job-results/job-2019-08-13T13.16-78844a9/job.log >>> (1/3) >>> tests/acceptance/machine_m68k_nextcube.py:NextCubeMachine.test_bootrom_framebuffer_size: >>> PASS (2.16 s) >>> (2/3) >>> tests/acceptance/machine_m68k_nextcube.py:NextCubeMachine.test_bootrom_framebuffer_ocr_with_tesseract_v3: >>> - >>> ue r pun Honl'flx ; 5‘ 55‘ >>> avg ncaaaaa 25 MHZ, memary jag m >>> Backplane slat «a >>> Ethernet address a a r a r3 2 >>> Memgry sackets aea canflqured far 16MB Darlly page made stMs but have >>> 16MB page made stMs )nstalled >> >> By the way, do we know why the output from this test case is >> garbled like this ? It suggests that something's not right >> somewhere...
I got better result using few options to tune, but later noticed they differ on Fedora/Ubuntu. Tesseract v4 has better result but it is alpha and we need to install train data. Not that big, 15MiB: https://github.com/tesseract-ocr/tessdata_best/blob/master/eng.traineddata I preferred to keep the simplest tests with acceptable result, we are not interested in fully understandable text output: we only want to know the framebuffer model works. Reading "Ethernet address" is good and quick enough. > The text is created from the framebuffer with the OCR-tool Tesseract - > which is just not good enough to recognize all words properly here. > > Thomas >